Docker and R: Rocker

Docker is a popular tool for building and deploying an application by using containers. Docker is designed to deliver an application and its dependencies in a package. Compared to other virtualization software, docker containers are lighweight and fast to deploy. The reason for this is Docker relies on the kernel of the host instead of virtualizing the whole kernel. Docker is used in data science application because it allows to package the application and all its dependencies in one container. [Read More]

Image segmentation using Python

Image segmentation is one of the hotspots in image processing and computer vision and is also very important in image recognition. One of the most known and powerful approaches in image segmentation is the convolutional neural network (CNN). There are different approaches for image segmentation.

[Read More]
tag1  tag2  tag3 

A closer look at the german federal election II

In the last example we took a look at the german constituencies and visualized those regions using a PCA. In this example we will cluster the constituencies using the principal component scores and try to find an ideal cluster solution. For clustering we will use hierarchical clustering. We first load the necessary packages for dealing with spatial data. The data used here has been prepared before the analysis by adding the principal component scores and by tidying the data up. [Read More]

Imbalanced Classification

Classification is a supervised task for categorizing observation given a set of variables. The category for each observation is given before. This is similiar to regression but the response variable is now discrete. The most used case is a binary classification but a multi-case classification is possible too. A possible problem in classification is the case of an imbalanced dataset, where the number of cases are not equally distributed. To counter this problem a number of sampling methods can be used. [Read More]