Landsat Classification using K-means


Landsat Classification using K-means

Unsupervised clustering allows the possibility to classify Landsat images without any further reference, to what exactly the image represents, as we discover the properties of the image automatically. We use a Landsat image from Cairo, as this image has different characteristics and contains the city of Kairo itself, green vegetation, water surfaces, desert, rocks, which are closely intertwined. A good algorithm shoud be able to discover the different surfaces, especially if the spatial characteristic is irregular, where for example green vegetation alternates with water surfaces and other characteristics.
The dataset is a Landsat 7 ETM+ from Kairo, which was downloaded from USGS. The coordinate reference is WGS84. Kmeans is a clustering algorithm, which assigns an observation to the nearest cluster center and after the assignment, it calculates the new cluster center for the new observations. For the new cluster center, the observation are again assigned to the nearest cluster center. This process is done iteratively, until the solution does not change

library(landsat)
## Loading required package: rgdal
## Loading required package: sp
## rgdal: version: 1.4-4, (SVN revision 833)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 3.0.1, released 2019/06/28
##  Path to GDAL shared files:
##  GDAL binary built with GEOS: TRUE
##  Loaded PROJ.4 runtime: Rel. 6.2.0, September 1st, 2019, [PJ_VERSION: 610]
##  Path to PROJ.4 shared files: (autodetected)
##  Linking to sp version: 1.3-1
library(raster)
library(RStoolbox)
## Registered S3 methods overwritten by 'ggplot2':
##   method         from
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
library(mapview)
library(ggplot2)
library(knitr)
tif=list.files(".",pattern = ".TIF")
image=stack(tif)
image=brick(image)

The downloaded Landsat datasest is in a .tif format, where each .tif file represents a single band. We first load and then combine each tif file together with the stack command. With brick a multi-layer object is created. It is similiar to a Rasterstack object, created by the stack command, but the processing time should be shorter.

ggRGB(image)

RGB image using Landsat is shown above. We see that. We can either rotate the image or just crop a part of the image. In our case we will just crop the image.

e <- as(extent(320000, 470000, 3245385+30000, 3459015-30000), 'SpatialPolygons')
crs(e) <- "+proj=longlat +datum=WGS84 +no_defs"
rc=crop(image,e)
ggRGB(rc)+ggtitle("Cropped Landsat Image of Cairo")

neuerVersuch3=unsuperClass(rc,nsamples=200,nClasses=3,nstart=40,clusterMap = T)
plot(neuerVersuch3$map,main="Landsat Classification for k=3")

knitr::include_graphics("./Scree_plot.jpeg")

As a way of deciding how many clusters we use the scree plot. The idea of the scree plot is to calculate for different number of clusters the total within squares of sum. This is a measure of total dispersion and indicates how similiar these clusters are. The within sum of squares decreases with the number of clusters added but quite often we see a sharp bend in this plot, which indicates the total number of optimal clusters. In the scree-plot we see a sharp bend for 8 and 9 clusters, which are quite close to each other.

landsat9=unsuperClass(rc,nsamples=1000,nClasses=9,nstart=40,clusterMap = T)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 500000)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 500000)
landsat10=unsuperClass(rc,nsamples=1000,nClasses=10,nstart=40,clusterMap = T)
plot(landsat9$map,main="Landsat Classification for k=9")

plot(landsat9$map,main="Landsat Classification for k=10")

mapviewOptions(basemaps="Esri.WorldImagery")
mapview(landsat10$map)

The package mapivew uses leaflet to visualize the image. In the case of the Landsat data this leads to a problem. Landsat datasets are very large for leaflet. Instead of plotting in full resolution, the pixel size is reduced and instead an interpolation is used. In this case this leads to a continous color palette instead of a discrete set.