Several notices about events tomorrow and later.
Student Question: What cutoff in homework problem 10.4i?
I have a question about 10.4 i. exercise. In Rattle we use default setting and cannot change the cutoff, so are we supposed to guess the cutoff for the most accurate classification? Or are we supposed to get it some other way?
Response: Good question. I have 3 levels of answer. At the simplest, should you push the cutoff up or down from 50%? (Be sure to specify which way is which – sometimes it can be ambiguous).
At the next level, Rattle produces an ROC curve, as we did with the airplane data on Wednesday. See textbook p 131. The ROC curve is traced out by moving the threshold all the way from 0 to 1.
Third level: Soon, you will learn how to grab the code produced by Rattle, and run it in RStudio. There you can change the cutoff parameter and calculate different confusion matrices. For example function confusion.matrix(obs, pred, threshold = 0.5) allows any threshold. Through trial and error, you can experiment with different thresholds. Of course, there are more specialized functions that can come up with the optimal answer.
Sony Playstation Network project
Anyone considering the Sony Playstation Network project, read this memo.BDA18 Sony PSN project update 4-19
It points you to some data, and requests a revised proposal as soon as possible. Or, switch to another topic. Make comments on this page if you are looking for a partner, or on the Final paper ‘dating site’
R Tutorial Friday in room 3201 at 1pm
Feiyang will provide several resources for learning R at the level we need it in BDA. She will demonstrate how functions work, and illustrate function use with examples from the textbook. Other topics will include using R Help, and good cheat sheets. This will all be useful next week when we move away from Rattle and toward straight R.
Next week: Linear continuous models (Linear Regression)
Next week, we will look at a method that everyone has seen in a different context, OLS linear regression. I don’t like the textbook treatment of the topic, so I’m assigning a supplemental book. Please read:
Gareth James et al, An Introduction to Statistical Learning with Applications in R.(Supplementary textbook.) It’s available from Springerlink. Review section 3.2 which should be familiar, and read section 3.3 on variations on the basic model linear. Also read DMBA main textbook, only sections 6.1 and 6.2
A more detailed assignment will be posted Saturday, and nothing is due Sunday.