Resources on data manipulation

Some are elsewhere on this site.  for May 23.

# From Chapter 6 of ISLR textbook.
# May 23, 2018 version – feature engineering, plotting .
# Clean out the global environment, e.g. by starting a new project.
# Or delete specific objects from it (Use Grid Display, check off the objects to erase)
#Or create a new project and workspace
#
#For teaching: /Tools/Global Options/Appearance/ select a visible color scheme.
# install.packages(“ISLR”) #gets a clean copy
# install.packages(“dplyr”)
library(dplyr)
library (ISLR)
library (glmnet )

set.seed(3838) #use something different than last time you worked with Hitters.
# If you keep same seed as before, still cannot be certain that random draws will be the same
# Only way to be certain is to reset the seed _right before_ you run random selections.

#Bring in the Hitters data. The Hitters dataframe was in the ISLR library
# fix(Hitters ) # This is not needed.
names(Hitters ) #variable names
dim(Hitters)

# As before, remove rows with missing data. Then check that it was done correctly.
Hitters2 <- na.omit(Hitters )
row.names(Hitters2) [5:15] #player names
str(Hitters2)
dim(Hitters2)

# install.packages(“gcookbook”)
library(gcookbook)
library(ggplot2)

############################################
#Should explore the data here

#Correlations, scatter plots, etc.
ggplot(Hitters2, aes(x=Salary)) + stat_ecdf()

Advertisements

Author: Roger Bohn

Professor of Tech Management, UC San Diego. General blog https://Art2Science.org Rbohn@ucsd.edu

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s