At this stage each year, many teams run into either or both of two problems.
A. Getting error messages in R that appear to indicate their computer is out of memory (RAM). This is annoying but almost always straightforward to fix. At least two teams have already run into this problem in 2018, and assumed that it would be a major difficulty.
B. More subtle and harder to fix is getting bogged down somewhere and running out of time. A common place this happens is in data acquisition and cleaning, This is easy to fix “in theory,” but my sad experience is that some teams sink into the trap of “just a little longer, and we will be finished.” This stage can last for weeks!
A few sad examples
- More than one team has spent several weeks locating, downloading, cleaning, and merging data about crime (or other topics) in multiple cities. When they started to analyze it carefully they discovered that the crime reporting systems in the cities were quite different. By then it was week 8 of the course, and only had time for a partial analysis of one city.
- A team had too little time to tune their models and algorithms. The result was a prediction that had too much error to be useful.
- A team was racing to finish, and when they got their model results they did not take the time to check that they were reasonable. They submitted a report claiming a prediction error below 1 percent. That means, invariably, that there is some “time travel” in their data: of the seemingly independent variables is actually a converted version of what they are predicting. Example: EPA fuel mileage, where fuel efficiency, oil consumption, and CO2 emissions all measure approximately the same thing.
What to do?
I will gradually provide notes on avoiding, or solving, both of these problems. Please take them seriously. A few hours invested now can save (literally) a week or longer later in your project.
- Memo: What to do if you run out of memory? BDA18 Running out of memory v1.3
- Don’t get bogged down!! Keep moving! You can go back and improve it later!