So, just a few days ago I posted Learning Club 00: Set up your development environment (Getting started with R). There I made a mistake and decided to use R without thinking about the data set I would use. I am still happy I wrote the post because it can give all the R users … Continue reading Learning Club 00.b: Setup your development environment (Get started with python package nfldb)

# Category: Data Science

Data Science related stuff. #rstats

## Learning Club 00: Set up your development environment (Getting started with R)

A few weeks ago I became aware of Renee’s (owner of the blog Becoming a data scientist) plan to start a data science learning club and I thought it was a cool idea. In the learning club she will post activities and the first one was about setting up your development environment: Activity 00: Set … Continue reading Learning Club 00: Set up your development environment (Getting started with R)

## Add non-overlapping labels to a plot using {wordcloud} in R

Several times when I create a plot I want to add labels for some dots directly on the plot. For this purpose I have looked for a solution to do this, because implementing it with text would probably take a lot of work. Luckily I found this two links: [stackoverflow] How do I avoid overlapping … Continue reading Add non-overlapping labels to a plot using {wordcloud} in R

## Statistical tests: One sample t-test in R

This is the second post about statistical testing. In the first one I explained the principal concept behind statistical tests. Parametric tests are, in contrast to non-parametric tests, statistical tests that make some kind of assumption about the data. In general parametric tests are used when we assume normality for the source population of our … Continue reading Statistical tests: One sample t-test in R

## Statistical testing: An introduction

I am planning to write about parametric and non-parametric testing but since I know that many people have difficulties with the concept of hypothesis testing itself, I am going to give an introduction to the basic concepts first without immediately trying to frighten you. In general, a statistical test consists of four steps: Formulate the … Continue reading Statistical testing: An introduction

## [Dimensionality Reduction #2] Understanding Factor Analysis using R

This time I am going to show you how to perform Factor analysis. In the next post I will show you some scaling and projection methods. The idea for this mini-series was inspired by a Machine Learning (Unsupervised) lecture I had at university. I will perform all this methods on the same data sets and … Continue reading [Dimensionality Reduction #2] Understanding Factor Analysis using R

## [Dimensionality Reduction #1] Understanding PCA and ICA using R

This time I am going to show you how to perform PCA and ICA. In the next one or two posts I will show you Factor Analysis and some scaling and projection methods. The idea for this mini-series was inspired by a Machine Learning (Unsupervised) lecture I had at university. I will perform all this … Continue reading [Dimensionality Reduction #1] Understanding PCA and ICA using R

## How to choose the correct plot for your data

Today I am going to talk about how to choose the correct representation for several types of data. From statistics class you might remember there are three types of data: Metric (data can be measured or counted, mathematical operations make sense) Ordinal (data can be ordered in a meaningful way, but mathematical operations (+, -, … Continue reading How to choose the correct plot for your data

## Info graphic for R plotting parameters

Do you always have to look up the numbers of plotting symbols or lines? I do and that is why I created an info graphic that summarizes these parameters and also some color functions. You can download this graphic here. The numbers on the axes give the value used for pch and lty according to … Continue reading Info graphic for R plotting parameters

## Create boxplots in R

My favorite plots are boxplots, because you have a lot of information in them. If you look at a very basic boxplot, you can see the median (bold line), the quartiles (upper and lower boundary of the box) and if there are outliers (by default those are values that are 1.5 times the boxlength away … Continue reading Create boxplots in R