BDA 2018; final schedule(updated)

I have created a new list of resources, for specific projects types such as spatial analysis and Twitter analysis. It is the  heading on the Latest Handouts page at Special topics for individual papers. 

Summary of the last 2 weeks of the course:

  • Only nominal homework – readings and one figure.
  • Work on projects. Ask for help if desired.  No more interim reports are due.
  • R Certification: If you want R certification for the course, take a one-hour quiz and meet some other requirements.
  • Make an in-class presentation: two-person teams only.
  • Final paper due

Wednesday, May 30. Handling unbalanced data, and other useful techniques.
Reading: Chapter 5.5, also 5.3 and 5.4. These were assigned previously.
Nothing to be turned in

Saturday, June 2: No progress report is due.

Monday, June 4: A/B Testing and other emerging topics in  Big Data

  • Look up specific techniques for your project. Spatial data and GIS, Text processing, Crime, Graphics, or Twitter. One or more applies to every project. Special topics for individual papers. 
  •   Turn in: One careful plot from your project. Hard copy, with comments on it by hand. Format the plot carefully and clearly including scales, colors, definitions, etc. Please turn these in by hand in class. This is to encourage hand-writing of comments.  Circle and explain at least one interesting/important feature of your plot.
    • Include a caption. Captions in scientific papers are sometimes several sentences long.
    • The goal of the assignment is to help you focus intensively on one result of your project, and how to explain it visually. It does not have to be a data-mining result.
  • Reading,  “The A/B Test: Inside the Technology That’s Changing the Rules of Business” Wired Magazine, 04.25.12.
  • Visit an e-commerce website and think about how to improve it using A/B testing.

Wednesday, June 6: All two-person project teams will give 5 to 7 minute presentations. The goal is to fascinate, impress, and surprise your audience. Think of this as the “elevator pitch” for your project.

Friday, June 8  1pm or other times as agreed: Quiz for R Certification. The quiz emphasizes data manipulation in R, Selecting data subsets, creating new variables , rearranging and redefining data such as event logs. The other requirements for R certificates are completing your project using appropriate R programming, and attending 50% of TA tutorials.

Friday, June 8 midnight: Formal due date for final project papers.
All projects who request one receive an automatic extension until Wednesday.
Submit both hard copy and PDF files. Submit via Turnitin, on TritonEd.

June 11.   Wednesday, June 13. Deadline for  projects.

Notes from class 3, CART using Rattle

To: Big Data Analytics students
From: Prof. Roger Bohn
Subject: Class #3 Monday April 9 – next steps, Q&A, homework schedule, 
Date: April 9, 2018

The lecture notes were provided before class. Visit Latest handouts  We did not cover all of them, and will continue with CART algorithm on Wednesday before discussing Toyota.

Another topic we discussed, not in the notes: Benefits and disadvantages of open source software.

Please email (or put in comments on this page) questions about the Weather exercise from the Rattle book. Several people asked good questions about Toyota after class. If there are no more questions about how to use Rattle, we will move right into the next segment on Wednesday.

Toyota homework now due Friday at Noon. The TritonEd assignment has been updated.

Still having trouble with Rattle? Feiyang 4pm today. Location unclear, check near GPS office 3132  Feiyang is polling about what her tutorial hours should be. Please respond to her Doodle poll at  No response = you don’t get a vote.

Other questions asked in class and not answered:

  • Can we have a group of 3 for homework. No. You can discuss with others if you put their names in a note. But only 2 people should work on the actual memo answers.
  • Grading scale, grading policy. I will post something about this. Homework is graded on a 0 to 10 scale. An average of 8 is fine.
  • How to find other people who are interested in projects. I just created a page specifically for that. Final paper ‘dating site’
  • Where to learn more R. Attend the TA tutorials, and I will shortly post a list of recommended websites and readings.  This page is a starting point. Resources for R language