Exploring lime on the house prices dataset

Pretty recently I found a paper with the title “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. The topic of interpretability is very important in the times of complex machine learning models and it’s also related to my PhD topic (reliability of machine learning models). Therefore I wanted to play around with … Continue reading Exploring lime on the house prices dataset

Learning Club 16: Genetic Algorithms

Some time ago I published a blog post with the title Know your data structures!. In this previous post I explained how I improved the running time of a genetic algorithm. I promised to go more into detail about other noteworthy things in the code in a separate article since not everything was straightforward when … Continue reading Learning Club 16: Genetic Algorithms

Know your data structures!

Just a few days ago I stated the following on Twitter: Just reduced the runtime of an algorithm from 9 hours to 3 min. by using a different data structure… Know you data structures 🙂 #rstats — Verena Haunschmid (@ExpectAPatronum) May 1, 2017 Since my tweet has been liked and shared a lot, I thought … Continue reading Know your data structures!

Learning Club 05-07: Starting to love rmarkdown (Naive Bayes, Clustering, Linear Regression)

I remember when I had an R course at university I was really not a fan of rmarkdown and knitr. But since I participate in a Learning Club, where people are encouraged to document and present their code, data and results, I started to love it. Prior to that I’ve always documented my assignments at the university either … Continue reading Learning Club 05-07: Starting to love rmarkdown (Naive Bayes, Clustering, Linear Regression)

Finding data sets Part 1: General data sources

I often encounter interesting algorithms or R packages which I want to test. The nice ones provide data for testing but often it is only dummy data. To get a good understanding of the method and its limitations real data might be required. Sometimes I would also like to explore data I have not used … Continue reading Finding data sets Part 1: General data sources

Adding basemap.at tiles to an R leaflet plot

Recently I wanted to visualise some data in a map of Austria. R Leaflet provides a pretty good looking map by default (openstreetmap.org) but I wanted to use basemap.at, which is a map for Austria and therefore probably the most accurate map available for Austria. Actually it is not very difficult but it was the … Continue reading Adding basemap.at tiles to an R leaflet plot

Use R to connect to twitter and create a wordcloud of your tweets

Recently I wanted to create a wordcloud of my tweets and do further analysis. In this post I am going to show you how to connect to twitter in R and how to make a wordcloud from your tweets. To follow this tutorial, you need a Twitter account. First steps in R Install required libraries … Continue reading Use R to connect to twitter and create a wordcloud of your tweets

Use rvest to scrape NFL weather data

If you are following my progress in the Data Science Learning Club you might know that I am using NFL data for the tasks. For predicting sports events I think it is not only important to have statistics about the players, teams and previous games but also about the weather. From when I was a … Continue reading Use rvest to scrape NFL weather data

Learning Club 00: Set up your development environment (Getting started with R)

A few weeks ago I became aware of Renee’s (owner of the blog Becoming a data scientist) plan to start a data science learning club and I thought it was a cool idea. In the learning club she will post activities and the first one was about setting up your development environment: Activity 00: Set … Continue reading Learning Club 00: Set up your development environment (Getting started with R)

[Howto] Using Google URL builder, Google Analytics and R to create trackable QR codes

Some time ago I needed a QR code for a project and also wanted to find out how many people used that QR code. Googling returns many, many options, too many in my opinion. Each web site has different features, some provide counters, every web site has different data types you can export (from bad … Continue reading [Howto] Using Google URL builder, Google Analytics and R to create trackable QR codes