This page links to the latest versions of course material. Some PDF, some HTML. Update May 29, 2018
Lecture Notes (chronological order)
 BDA18D3 Chap9_CART RB. For the class of April 9, on CART.
 BDA18 Class 4 Lecture notes Toyota For the class of April 11, on CART + Toyota
 Logistic Regression 2018 Class of April 16 on classification using linear models aka logistic regression.
 Class of April 18 on linear categorical models aka logistic regression. BDA18 illustration of Rattle use 0418
 Notes on Linear Regression, Week 4, April 23, 25 BDA18 regression 04/25.pdf. BDA18 regression slides 423. Use primarily the April 25 version; 4/23 has a few additional slides.
 How to go from Rattle to R. BDA18 Rattle to R code 425.pdf
 Lecture Notes Week 5 Random Forests BDA18 Random Forests2018B
 Lecture Notes Week 6 Text Mining, Day 1
Tutorial worked through in class. Basic Text Mining in R 2017 version  Week 6 Text mining #2 2018b
 Week 7 LASSO, Monday May 14.
 Week 8 lecture notes. Monday May 21. BDA18 feature engineering case study
Advice, tutorials, reference books, other useful material
 New! R Graphics Cookbook – Excerpt. Useful quick graphics
 NEW! Resources for doing graphics. Page with list of resources
 New! contingency table defns. Multiple ways to measure classification accuracy.
 New! Doing a great final report. BDA18 Writing your final report
 Dealing with the “big” in Big Data. A revised+expanded discussion of how to speed up computation and how to avoid running out of memory. Includes bibliography. This supersedes some earlier notes, which have been removed from this list.
 Reference books on R and on specialized data mining methods. Resources for Mining + R language
 Don’t be a perfectionist. Do simple analysis first; make it more complex only after the simple stuff works.
 Text mining books and case studies. Textmining resources for projects. More on text mining in a memo:
 A first list of key ideas in Big Data Analytics. BDA18 Class 4 key conceptsC
 Doing linear modeling and variable transformation from Rattle. BDA18 Linear models in Rattle 180417 Moving between Rattle and R: BDA18 Rattle to R code 425.pdf
 More on linear regression (see second half of the document). Homework week 4: Linear regression
 Resources for Mining + R language Textbooks, reference sites and books, cheat sheets, etc.
Special topics – for specific papers
The Big Data Analytics course introduces data mining with techniques and concepts that are broadly applicable. Individual topics and projects have specific techniques, needs, and resources. In keeping with the theme “Borrow and reuse, don’t invent anything yourself,” here are some resources that are especially suited to particular topics.
Don’t forget to try to site’s Search window (usually near the upper right) to look up possible keywords. Many of these topics also have entire books about them, such as on Springerlink.
 Especially useful R books for the course. Resources for Mining + R language
 Text processing. Start with this list: Text Mining Resources for Projects Then look at https://bda2020.files.wordpress.com/2017/04/bda17textminingresources.pdf These two pages alone will save many hours of programming time. There are also many books on this subject. Specific books include: Mining Text Data R for Marketing Research and Analytics
 Spatial data, Geographic Information Systems. For projects on taxis, bicycle sharing, crime, and many other topics where the underlying data is geographically distributed, and location affects behavior. Read this page: Spatial (GIS) data in R: easy maps One of many books is Applied Spatial Data Analysis with R. Also Spatial analysis in R
 Time series require a special kind of validation, in which you train the model on early years, and then validate it on later years. You can do this in rolling fashion. For example use years 15 for training, and validate on year 6. Then use years 1 to 6 for training (or 2 to 6), and then validate on year 7. Validating machine learning time series models
 Twitter and other social networking sites. In addition to material on text mining, R for Marketing Research and Analytics; Text mining of Amazon reviews.; Also be sure to read about “Regular Expressions.” Handling and Processing Strings in R by Gason Sanchez is a 100 page mini book on manipulating text. Look here when you need to do something with text like “find all words that start with ‘UCSD’.” Finally, there are many previous student papers in BDA that use Twitter data.
 Local crime. Local crime models are tricky because they require predicting events that are spread out over space and time. If you set up your data with “buckets” that are geographically and temporally small, then most buckets are empty. But if you make the buckets too large, such as “Any time on Mondays, for the lower half of Manhattan,” then the buckets are too big to be useful to decision makers. Wk 8: Feature engineering, other topics CHRONological handouts, 2016. Lectures 2017

Syllabus (overall goals, requirements, assignments)
 Latest version 1.05, dated 4 April 2018 BDA18 Syllabus v 1.05
 The main textbooks you will want. 2018 Big Data Analytics: materials
 Syllabus discussion of grading and memo formats (updated)
 Project update assignment 2, plus example student report. BDA18 Project assignment example 417
 Readings and homework for week 4. Includes various supplemental notes on linear regression and related topics. Homework week 4: Linear regression
 Readings and homework on Text Mining, week of May 7. For Wednesday class (Due Friday. The Tuesday assignment is voluntary, for now.) BDA18 Text mining assign 2 Text mining #2 2018b Book errata: The zip file of ads is actually named incorrectly in the book. Look for it here.
 Readings and homework for week 7 on LASSO algorithm. BDA18 Assignment May 1418
 Week 8 Feature engineering. Wk 8: Feature engineering, other topics
 New! Writing a good final report.Writing your final report June 8
Other links:
Google folder for the course. There you will find all datasets for the textbook,
The official textbook web site is http://www.dataminingbook.com/book/redition
Once you register, you can get these datasets, and the R Code. (It’s better to type the R Code by hand, the first time.)
Contact Information
PROFESSOR ROGER BOHN OFFICE = RBC 1315 PHONE 858 5347630
EMAIL: RBOHNat UCSDdotEDU.
Personal web site: Art2science.org
2 thoughts on “Latest syllabus, assignments, + notes for #BDA Big Data Analytics at UC San Diego”