Resources for Mining + R language

Lots of books and web sites to use.

Advertisements

[Updated 05/08/2018] There are a lot of good resources for the R programming language, and for data mining/machine learning/AI/BDA. There are video courses, books, reference sites, discussion boards, and plenty more. The single best place to look for resources is probably Computerworld, and its guide 60+ R resources to improve your data skills.  Each has a few sentences of description. Another good place for information is the UCSD library has a well-organized  UCSD Guide to Business Analytics, just as it has guides for political science, international studies, etc.

Remember that you don’t need to learn much R in order to use it for analytics. What you need in the course is enough to 1) glue together pieces of R code that do particular tasks, and 2) read guides to specialized topics, such as web scraping, text mining, or particular algorithms, that you need for an individual project.

Cheat Sheets

Good reference books about R

Several books and web sites contain “recipes,” meaning chunks of R code to do particular tasks. These are big time-savers, although they are not a good way to learn the language. Everyone should get at least one of these, as e-book, physical book, or permanent bookmark in your notes! Here are a few:

http://proquest.safaribooksonline.com/book/programming/r/9780596809287

  • Data Wrangling with R – on the library’s Springerlink site for complete downloading. Springerlink.com, but only from VPN or on campus.
  • There are several other good books, but they are expensive. If you are not on a budget, ask me.
  • The tidyverse and set of new tools for file and data manipulation. Much more efficient than raw R, and faster to write code with. Chapter 5 is probably the place to start. This book is available, but the same material is on a web site http://r4ds.had.co.nz/

 

Library downloadable books on Data Mining using R

These books are the ones to study when you want to learn a data mining technique. All of them use R as the primary language. These books are about machine learning, and are textbooks.  The earlier books are about the R language, and are written as reference books. 

    Rattle, the  second textbook for BDA. Used  because it has an easy interface.     http://link.springer.com/book/10.1007/978-1-4419-9890-3   

 ISLR = Introduction to Statistical Learning with R   http://link.springer.com/book/10.1007/978-1-4614-7138-7.  Course textbook #3. More theoretical than other books in this list, it has good explanations of how and why important algorithms work. 

 ggplot2 http://link.springer.com/book/10.1007%2F978-0-387-98141-3

The main graphics system  we use. This book was written by ggplot2’s developer, and covers the early version of the software. A new edition is due out in 2018

  http://link.springer.com/book/10.1007/978-1-4419-1318-0

If you know Stata and are learning R, this book is good for looking up “how do I do that?”

http://link.springer.com/book/10.1007%2F978-3-319-12066-9

A short book that covers the basics of data mining, with everything written in R 

R for specific kinds of analysis (networks, GIS, marketing, ….)

Springerlink publishes a series of more than 60 books on different uses of R. https://link.springer.com/bookseries/6991 They are at the intermediate level, about right for refining your knowledge of special techniques needed for a  project. Examples: Spatial analysis in R, R for Market Research, Data Wrangling with R,  ggplot2 (several books), Political analysis using R, Analyzing Networks using R, Phylogenetics with R, etc. All of them are free to download, or you can buy them as paperback books for $25.

Because it is so trendy, practically every business and textbook publisher has books on data mining and related topics. You can search them through the UCSD book catalog, UCSD.worldcat.org. For example, here are 2000+ e-books about ‘Machine Learning’. That is not a misprint, and all are available through UCSD in some form.

Last, there are literally dozens of books about R/statistics written for a particular audience, or exploring a particular applied statistics topic.  The following lists books I have found especially relevant to this course. Note that many of them are specifically for reference: when you need  to do something specific, look it up in one of these books. Others are intended for learning from.

For searching on your own e.g. on Google Scholar, good phrases are data mining, machine learning, data analytics (broader), data science, and specific topics for your application, such as fraud detection. Use quotation marks around these phrases! Finally, there are many 20 to 50 page articles that cover the basics of particular R topics. These are often more up to date than books, and better ways to get a start on a topic.

Mining Text Data    

R for Marketing Research and Analytics

A User’s Guide to Network Analysis in R

Statistical Analysis of Network Data with R

Applied Spatial Data Analysis with R (Geographic Info systems)

Graphical Models with R

Six Sigma with R

Introductory Time Series with R

Applied Econometrics with R

Nonlinear Regression in R

Data Manipulation with R

 

Author: Roger Bohn

Professor of Technology Management, UC San Diego. Visiting Stanford Medical School Rbohn@ucsd.edu. Twitter =Roger.Bohn

One thought on “Resources for Mining + R language”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s