CHRONological handouts, 2016


All of the course material posted on the website in 2016 appeared on the following list in forward chronological order.

2016 Material. For 2017, look at the blog postings themselves.

  1. March 23. BDA16-03-23 Getting started

  2. March 30. BDA16S guide to web sites DRAFT

  3. March 30. BDA for Pollution

  4. March 28. List of Popular Press Articles

  5. March 31. Data set for Toyota Corolla exercise

  6. April 3. Lecture notes for class 1Introduction. BDA microsoft R presentation

  7. April 3. Lecture notes for class 2Data mining process.  BDA16s Day 2 lecture3-30

  8. April 4. Lecture notes for class 3 on Classification treesBDA16S Decision trees April 4

  9. April 4. Examples of potential projects – discussed in class April 4 BDA16S project examples

  10. April 6. Sample quiz for April 13. BDA6S R quiz week 3

  11. April 6. Lecture notes for class 4 on Regression treesBDA16S CART trees April 6

  12. April 6. Week 1 and 2 TA session material. week 1 TA sessionweek 2 TA session

  13. April 9. Background reading assignment  on linear and logistic regression.

  14. April 9. Previous BDA paper  BDA Assign 2016-03-16_Hyerim Kim_Final Project Paper, analyzing a huge Korean data set (in English) on health care.

  15. April 9. Memorandum format for submitting homework. Supplement to the syllabus.

  16. April 9. Reading assignment and problem set assignment for Week 3 (April 13). Both are linked on the Handouts page.

  17. April 9. The data set for week 3, which consists of EPA automobile mileage for 30 years,  is linked from the Data Sets page.

  18. April 10. Data mining case studies concerning pollution and health. Check these out if interested in development, environment, or jobs in health sector.

  19. Lecture notes for April 11  on linear regression. The same notes will carry over to April 13.

  20. April 12. Week 3 TA session material. week 3 TA session

  21. April 14.  Previous BDA paper on Text Mining Amazon Reviews.   EstradaSimpsonFinalPaper + comments

  22. April 14. In-class Quiz on manipulating observations in R. BDA16S R quiz Apr. 13

  23. April 14. In-class quiz solutions

  24. April 16. Text mining of Amazon reviews, sample project

  25. April 16. Blog posts (More quick R methods, Machine learning videos by authors of our supplemental textbookR-bloggers pulls useful articles from multiple R blogs), find them here.

  26. April 17. BDA_Assignment April 12th+RB comments Solution to homework on linear models using Rattle.

  27. April 17. R code for  cleaning up data, using EPA data as the example. Find it on the R tips page:

  28. April 17. Using R to clean a data set, including removing some observations and variables, and adding new ones. Applied to the EPA mileage homework data. Intended to be used for future assignments as a starting guide. Here is the actual data set after I cleaned it.

  29. April 18. A student’s solution to regression and cleaning data on the linear EPA data. Shows a solution from (nearly) the start to the end.  Chuyue Wu_HW5_0417_Final+RB comments

  30. Case Study on Crime Forecasting.  Crime Forecasting Using Data Mining Techniques

  31. April 19. Week 4 TA session material. Week 4 TA session

  32. Getting started with text mining: a guide to resources. BDA16s text mining links

  33. Homework on Random Forests. (See TritonEd for the official assignment). BDA16S HW on random forests

  34. April 26. Two case studies. For class April 27.

  35. April 26. Week 5 TA session material. Week 5 TA session

  36. May 4. Week 6 TA session material. Week 6 TA session

  37. Lecture notes May 4 on general textual analysis. Text mining Marko +Bohn edits

  38. May 4. Using Google Trends data directly in R. 

  39. Project assignment due May 8. Project assign for May 8

  40. A list of about 5 good readings about ggplot2 and plotting in general.  Includes one book (free, of course).

  41. May 10. Week 7 TA session material. week 7 TA session. Also it can be found at

  42. Lecture notes, May 9 and 11.BDA graphics for presentations B

  43. How I grade final reports. BDA good final reports

  44. May 16. Lecture notes on LASSO. BDA model selection RB lecture

  45. May 17. Week 8 TA session material. week 8 TA session Also it can be found at

  46. May 18. Lecture notes on A/B experiments. Bohn L by E from Kohavi 2013
  47. May 18. Helping doctors better understand the statistics of medical testing. Risk literacy in medical decision-making
  48. May 21  Dealing with “out of memory” and similar errors. BDA16 working w BIG data B
  49. About 10 pages of advice on writing final reports, posted on the page 
  50. Bibliography of books that are worth downloading. bda16-reference-book-suggestions
  51. Validating and testing when your data is over time. Sometimes you can ignore this, but in other cases (such as forecasting disease) it is critical.