Two warnings as projects heat up

At this stage each year, many teams run into either or both of two problems.

A. Getting error messages in R that appear to indicate their computer is out of memory (RAM). This is annoying but almost always straightforward to fix.  At least two teams have already run into this problem in 2018, and assumed that it would be a major difficulty.

B. More subtle and harder to fix is getting bogged down somewhere and running out of time. A common place this happens is in data acquisition and cleaning, This is easy to fix “in theory,” but my sad experience is that some teams sink into the trap of “just a little longer, and we will be finished.” This stage can last for weeks!

A few sad examples

  • More than one team has spent several weeks locating, downloading, cleaning, and merging data about crime (or other topics) in multiple cities. When they started to analyze it carefully they discovered that the crime reporting systems in the cities were quite different. By then it was week 8 of the course, and only had time for a partial analysis of one city.
  • A team had too little time to tune their models and algorithms. The result was a prediction that had too much error to be useful.
  • A team was racing to finish, and when they got their model results they did not take the time to check that they were reasonable. They  submitted a report claiming a prediction error below 1 percent. That means, invariably, that there is some “time travel” in their data: of the seemingly independent variables is actually a converted version of what they are predicting. Example: EPA fuel mileage, where fuel efficiency, oil consumption, and CO2 emissions all measure approximately the same thing.

What to do?

I will gradually provide notes on avoiding, or solving, both of these problems. Please take them seriously. A few hours invested now can save (literally) a week or longer later in your project.

  1. Memo: What to do if you run out of memory?  BDA18 Running out of memory v1.3  
  2. Don’t get bogged down!! Keep moving! You can go back and improve it later!

 

Advertisements

Author: Roger Bohn

Professor of Technology Management, UC San Diego. Visiting Stanford Medical School Rbohn@ucsd.edu. Twitter =Roger.Bohn

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s