Package ‘tibble’ in R

What is ‘tibble’ package?

According to Hadley Wickham “Tibbles are a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not.

The name comes from dplyr: originally you created these objects with tbl_df(), which was most easily pronounced as “tibble diff”. “

Find its similarities and dissimilarities with data.frame. More info here : tibble

Data manipulation in R using dplyr and plyr – for non programmers

#essential functions
library(‘dplyr’)
library(‘plyr’)

# creating dummy dataset
X1<-rnorm(n=50,mean = 30,sd=5)
X2<-factor(rbinom(n=50,size=1,p=0.7),labels=c(‘male’,’female’))
X3<-rnorm(n=50,mean=130, sd=15)
X4<-factor(rbinom(n=50,size=2,p=0.5),labels = c(‘M’,’C’,’I’))
dat<-data.frame(X1,X2,X3,X4)

#examine data frame

str(dat)
head(dat)

#create new column/variable
mutate(dat,sqX3=X3+10) #add new columns for X3*X3

#select rows
dat_filter<-filter(dat,X2 ==’male’) #focus on a subset of rows
dat_filter

#select columns
dat_col<-select(dat,X1,X2) #focus on a subset of variables
dat_col

#recode categorical var
dat$race<-mapvalues(dat$X4,from=c(‘M’,’C’,’I’), to=c(‘M’,’NM’,’NM’))
dat$race
dat$X4
#or
dat$race2[dat$X4==’M’]<-‘malay’
dat$race2[dat$X4==’C’ | dat$X4==’I’]<-‘Nonmalay’
dat$race2
dat$race2<-factor(dat$race2)
str(dat$race2)

#recoding numerical var
dat$highBP<-‘no’
dat$highBP
dat$highBP[dat$X3>129]<-‘yes’
dat$highBP<-factor(dat$highBP)
dat$highBP
dat$X3

#sort data
dat_sort<-arrange(dat,X2,X4) #re-order the rows
dat_sort2<-arrange(dat,desc(X2))
dat_sort2

#source: http://www.cookbook-r.com/Manipulating_data/Recoding_data/
#source: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

R Library: Contrast Coding Systems for categorical variables

Source : http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm#dummy

most commonly used:

Before considering any analyses, let’s look at the mean of the dependent variable, write, for each level of race.  This will help in interpreting the output from later analyses.

 

Inefficient ways for subsetting in R

#as an R newbie, still struggling with subsetting

#this is slow but it is useful for non-programmers like me

a<-rnorm(50,mean=10,sd=1) #gen normal dist vector
a
row.a<-a[1:10] #select row 1:10
row.a
sex.a<-c(“male”,”female”) #gen a vector with elements “male” and “female”
sex.a
sex.b<-rep(sex.a,25) #replicate male and female 25 times
sex.b
data<-cbind(a,sex.b) #combind 2 vectors
df.data<-data.frame(data)# creata dataframe
class(df.data) #verufy class
View(df.data) #View
str(df.data) #obs structure
df.data$a<-as.numeric(df.data$a) #convert factor to numeric
str(df.data)
male.data<-subset(df.data, sex.b==”male”) #select male data
male.data
male.less.10<-subset(df.data,sex.b==”male” & a<10.00) #select male data with a <10
male.less.10