Should I take Big Data Analytics in 2018?

Advice on whether to take my Big Data Analytics @UCSD course.

This Spring quarter 2018 course,  GPIM 452, is a self-contained introduction to “data mining,  business analytics, and all that.” I have put some information about the textbooks in the previous entry on this blog.  If you are trying to decide whether to take the course, take a look at either of the 2 main textbooks, especially their Chapter 2. You can see the $ one without purchasing it using Amazon “Look Inside.” The other one is free, as discussed in the blog entry.

BDA emphasizes decision making, as opposed to hypothesis testing. In other words, it is oriented toward analysts, not academic researchers. Of course, very closely related methods are becoming accepted in the research community.  (In Google Scholar, search  Hal Varian Machine Learning for discussion by an economist who “crossed over” to Google.)

I will post occasional updates about the course before it starts, both on this blog and on Twitter @RogerBohn #BDA. You can email me with questions, or post them as comments here.

Enjoy the rest of this quarter, and your Spring break.

Roger Bohn

Random Forests + LASSO Lecture May 11

Here are the lecture notes on Random Forests from Thursday May 11.  BDA17 Random Forests May 11 Bohn  Remember, Random Forests are a technique everyone should try.  LASSO, also discussed on Wednesday, is great when you have lots of variables. With fewer than 20 variables, it’s not as necessary. BUT

LASSO, also discussed on Wednesday, is great when you have lots of variables. With fewer than 20 variables, it’s not as necessary. BUT remember that you will often want to add interaction terms (and jump terms/quadratic terms/etc.) to linear models. As soon as you start that, the number of variables ballons.


Different quant sub-disciplines, used for different purposes

What are the differences between data science, data mining, machine learning, statistics, operations research, and so on?Here I compare several analytic disci…

Source: 16 analytic disciplines compared to data science – Data Science Central

RB comment: Useful vocabulary for job-hunting synonyms. I don’t take the nuances of his distinctions seriously, such as “business intelligence” versus “business analytics” versus “data analysis.”  Each organization needs a range of skills to do a range of activities. But, despite that, it is good for showing the wide range of quant skills that are useful.

Machine learning videos by authors of our supplemental textbook

Stanford University professors Trevor Hastie and Rob Tibshirani (authors of the legendary Elements of Statistical Learning textbook) taught an online course based on their newest textbook, An Introduction to Statistical Learning with Applications in R (ISLR). I found it to be an excellent course in statistical learning (also known as “machine learning”), largely due to the high quality of both the textbook and the video lectures. And as an R user, it was extremely helpful that they included R code to demonstrate most of the techniques described in the book.If you are new to machine learning (and even if you are not an R user), I highly recommend reading ISLR from cover-to-cover to gain both a theoretical and practical understanding of many important methods for regression and classification. It is available as a free PDF download from the authors’ website.If you decide to attempt the exercises at the end of each chapter, there is a GitHub repository of solutions provided by students you can use to check your work.As a supplement to the textbook, you may also want to watch the excellent course lecture videos (linked below), in which Dr. Hastie and Dr. Tibshirani discuss much of the material. In case you want to browse the lecture content, I’ve also linked to the PDF slides used in the

Chapter 1: Introduction (slidesplaylist)

Chapter 2: Statistical Learning (slidesplaylist)

Chapter 3: Linear Regression (slidesplaylist)

Chapter 4: Classification (slidesplaylist)

Chapter 5: Resampling Methods (slidesplaylist)

Chapter 6: Linear Model Selection and Regularization (slidesplaylist)

Chapter 7: Moving Beyond Linearity (slidesplaylist)

Chapter 8: Tree-Based Methods (slidesplaylist)

Chapter 9: Support Vector Machines (slidesplaylist)

Chapter 10: Unsupervised Learning (slidesplaylist)

Interviews (playlist)


: In-depth Introduction To Machine Learning In 15 Hours Of Expert Videos | R-bloggers