A new book by Jeffrey Stanton from Syracuse Iniversity School of Information Studies, An Introduction to Data Science, is now available for free download. The book, developed for Syracuse’s Certifi…
Model selection techniques have existed for many years; however, to date, simple, clear and effective methods of visualising the model building process are sparse. This article describes graphical methods that assist in the selection of models and comparison of many different selection criteria. Specifically, we describe for logistic regression, how to visualize measures of description loss and of model complexity to facilitate the model selection dilemma. We advocate the use of the bootstrap to assess the stability of selected models and to enhance our graphical tools. We demonstrate which variables are important using variable inclusion plots and show that these can be invaluable plots for the model building process. We show with two case studies how these proposed tools are useful to learn more about important variables in the data and how these tools can assist the understanding of the model building process. Copyright © 2013 John Wiley & Sons, Ltd.
A very useful application of subsetting data is to find and remove duplicate values. R has a useful function, duplicated() , that finds duplicate values and returns a
It is free
Nice article by : http://jeffreyhorner.tumblr.com/post/25804518110
Open source software for spatial statistics
This will explain the basic data types for R