This page is only for BDA participants who are working with the Sony Playstation data set. The information flow will usually be:
SIE ===> William ===> Professor ====> This page ====> Participants
Ground Rules for this Project
- You must have signed an NDA and gotten it to William before getting any data. No exceptions. You must take the NDA seriously, including protecting the safety of electronic data.
- For questions and information flow in the other direction, William will answer questions submitted by email. But expect 3 day responses, so mainly figure out answers on your own.
- This project will require initiative on your part. Huge amounts of information relevant to the project are around, but you have to identify and locate it yourselves. Examples include thorough documentation for Adobe’s web statistics package, the structure of the web site itself, the meanings of the variables (many of which are standard Adobe material). You will deal with incomplete and occasionally even erroneous data.
- Creativity and curiosity will be well rewarded. This is a great opportunity – take advantage of it.
Write to me with confirmation that you did the NDA, and that both team members are willing and able to work on the project. I will send back the URL’s for the data. Write by team. Be sure to use hashtag #SIE-PSN in all your email about this project.
Advice and Insights from Wednesday’s Lunch
- The supplied data is of two kinds: Sony’s mobile PSN app, and the PSN website. You probably want to concentrate on one or the other. Both are of great interest to Sony.
- One hour of data should be plenty to start with. Web and mobile sessions average less than an hour, so you should have many complete sessions in one hour. Limiting the data in this way speeds up development activity
- As Blake put it, visiting the actual website and using the mobile app yourself are “table stakes.” In any project, what you see in a database is only a pale reflection of reality. Tracing their behaviors carefully will give lots of insights
- I allow and encourage cooperation between the teams. This may be especially useful in the early data munging, and in finding reference material. This is not a zero-sum activity; collaboration and producing public goods will earn you “points.”
- You can immediately strip the data down to less than 100 variables. For example, he described how many variables come in two variants, pre and post processing. You can throw away the unprocessed variables.
- I recommend studying raw data at first, but then transitioning to session data. Each time a user “hits” a Sony server, it creates a transaction record. Informally we call those “clickstream data,” although it is actually more aggregated than that.
- Getting session data involves sorting everything by user ID. dplyr can probably do the sorting of a single hour in one pass, but only if you have shrunk the dataset first in various ways. As few as 50 variables may be enough to get started with.
Files and sites
- BDA18 Sony PSN project update 4-19. What kinds of insights Sony is looking for? What variables should you start with?
- An R file for early processing of the data. Sony_BDA_Preprocessing It has lots of comments and explanations. Looks like routine R stuff that you can easily figure out yourself. For example:
- # How do I call out a single variable in my dataframe? dataframe$column. hit_data$prop75
- # What are the unique values in a variable? unique(hit_data$prop75)
- # How many unique variables? length(unique(hit_data$prop75))
- Powerpoint presentation used in class. SIE Adobe for GPS BDA 2018
- Sony web sites: playstation network
Participants approved so far
- Allen Tian, Sylvia Wu