R For Data Science: A free book by Garrett Grolemund and Hadley Wickham

Everyone within R community knows very well Garrett Grolemund and Hadley Wickham. These two super brilliant guys have written an open source book for free of course here :

http://r4ds.had.co.nz/

On the first page, they write “This is the website for “R for Data Science”. This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with R. You’ll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. You’ll also learn how to manage cognitive ….”

Clipping (Subsetting) a point layer over a polygon layer

Nice tutorials by Robin Lovelace on Clipping

‘This miniature vignette shows how to clip spatial data based on different spatial objects in R and a ‘bounding box’. Spatial overlays are common in GIS applications and R users are fortunate that the clipping and spatial subsetting functions are mature and fairly fast. We’ll also write a new function called gClip(), that will make clipping by bounding boxes easier.’

http://robinlovelace.net/r/2014/07/29/clipping-with-r.html

A simple example of using ‘sprintf’

In this example, I am showing a very brief example of using ‘sprintf’ in a user-defined function for R

Epidemiology and Public Health

The University of Alabama at Birmingham explains Epidemiology and Public Health. It states

“Public Health is a blend of sciences, skills and convictions that is focused on the preservation and improvement of the health of all people through preventive (rather than curative) measures.
Epidemiology is considered a basic science of public health. Epidemiology is: a) a quantitative discipline built on a working knowledge of probability, statistics, and sound research methods; b) a method of causal reasoning based on developing and testing hypotheses pertaining to occurrence and prevention of morbidity and mortality; and c) a tool for public health action to promote and protect the public’s health based on science, causal reasoning, and a dose of practical common sense (1).
The word epidemiology comes from the Greek words epi, meaning “on or upon,” demos, meaning “people,” and logos, meaning “the study of.” Many definitions have been proposed; here are two that capture the underlying principles and the public health spirit of epidemiology:”

Read more here : http://www.soph.uab.edu/epi/academics/studenthandbook/what

Data resources : Places where you can download data

UN : http://data.un.org/
US : http://www.data.gov/
http://www.gapminder.org/data/
http://www.asdfree.com/ – with R codes
Kaggle : https://www.kaggle.com/datasets
rOpenSci : https://ropensci.org/

Credit to : https://www.coursera.org/learn/data-cleaning/home/info

AND:

http://data.princeton.edu/wws509/datasets

Removing duplicates in R using ‘dplyr’ and ‘data.table’

In this post, I will show how to remove duplicates of observations in a data frame.