[Updated 05/08/2018] There are a lot of good resources for the R programming language, and for data mining/machine learning/AI/BDA. There are video courses, books, reference sites, discussion boards, and plenty more. The single best place to look for resources is probably Computerworld, and its guide 60+ R resources to improve your data skills. Each has a few sentences of description. Another good place for information is the UCSD library has a well-organized UCSD Guide to Business Analytics, just as it has guides for political science, international studies, etc.
Remember that you don’t need to learn much R in order to use it for analytics. What you need in the course is enough to 1) glue together pieces of R code that do particular tasks, and 2) read guides to specialized topics, such as web scraping, text mining, or particular algorithms, that you need for an individual project.
Cheat Sheets
- A good cheat sheet to start with is https://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf Dated December 2012.
- The “ultimate sheet.”
- Others are here: R “cheat sheets”
- RStudio has a number of cheat sheets. https://www.rstudio.com/resources/cheatsheets/. I find them generally complex, but several are good to have around such as dplyr (Data Transformation), Data Import.
Good reference books about R
Several books and web sites contain “recipes,” meaning chunks of R code to do particular tasks. These are big time-savers, although they are not a good way to learn the language. Everyone should get at least one of these, as e-book, physical book, or permanent bookmark in your notes! Here are a few:
- Website: http://www.cookbook-r.com Free. Visit this – it has answers to many code issues.
- R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics. By Paul Teetor Also available through UCSD library, but only online.
- Data Wrangling with R – on the library’s Springerlink site for complete downloading. Springerlink.com, but only from VPN or on campus.
- There are several other good books, but they are expensive. If you are not on a budget, ask me.
- The tidyverse and set of new tools for file and data manipulation. Much more efficient than raw R, and faster to write code with. Chapter 5 is probably the place to start. This book is available, but the same material is on a web site http://r4ds.had.co.nz/
- R Recipes http://link.springer.com/book/10.1007%2F978-1-4842-0130-5. Similar to R Cookbook but not as thorough. However, it is downloadable.
- new! R in a Nutshell http://proquest.safaribooksonline.com/9781449358204. An excellent reference book on R.
Library downloadable books on Data Mining using R
These books are the ones to study when you want to learn a data mining technique. All of them use R as the primary language. These books are about machine learning, and are textbooks. The earlier books are about the R language, and are written as reference books.
Rattle, the second textbook for BDA. Used because it has an easy interface. http://link.springer.com/book/10.1007/978-1-4419-9890-3
ISLR = Introduction to Statistical Learning with R http://link.springer.com/book/10.1007/978-1-4614-7138-7. Course textbook #3. More theoretical than other books in this list, it has good explanations of how and why important algorithms work.
ggplot2 http://link.springer.com/book/10.1007%2F978-0-387-98141-3
The main graphics system we use. This book was written by ggplot2’s developer, and covers the early version of the software. A new edition is due out in 2018
http://link.springer.com/book/10.1007/978-1-4419-1318-0
If you know Stata and are learning R, this book is good for looking up “how do I do that?”
http://link.springer.com/book/10.1007%2F978-3-319-12066-9
A short book that covers the basics of data mining, with everything written in R
R for specific kinds of analysis (networks, GIS, marketing, ….)
Springerlink publishes a series of more than 60 books on different uses of R. https://link.springer.com/bookseries/6991 They are at the intermediate level, about right for refining your knowledge of special techniques needed for a project. Examples: Spatial analysis in R, R for Market Research, Data Wrangling with R, ggplot2 (several books), Political analysis using R, Analyzing Networks using R, Phylogenetics with R, etc. All of them are free to download, or you can buy them as paperback books for $25.
Because it is so trendy, practically every business and textbook publisher has books on data mining and related topics. You can search them through the UCSD book catalog, UCSD.worldcat.org. For example, here are 2000+ e-books about ‘Machine Learning’. That is not a misprint, and all are available through UCSD in some form.
Last, there are literally dozens of books about R/statistics written for a particular audience, or exploring a particular applied statistics topic. The following lists books I have found especially relevant to this course. Note that many of them are specifically for reference: when you need to do something specific, look it up in one of these books. Others are intended for learning from.
For searching on your own e.g. on Google Scholar, good phrases are data mining, machine learning, data analytics (broader), data science, and specific topics for your application, such as fraud detection. Use quotation marks around these phrases! Finally, there are many 20 to 50 page articles that cover the basics of particular R topics. These are often more up to date than books, and better ways to get a start on a topic.
R for Marketing Research and Analytics
A User’s Guide to Network Analysis in R
Statistical Analysis of Network Data with R
Applied Spatial Data Analysis with R (Geographic Info systems)