1 ESRI Satellite Images

We start off with the satellite images over the a shopping complex near Schlieren, Switzerland (acquired from ESRI ArcGIS Online)

1.1 Load the data in

Once the Spark Cluster has been created and you have the SparkContext called sc (automatically provided in Databricks Cloud or Zeppelin), the data can be loaded using the Spark Image Layer. The command readTiledImage loads in the data as tiles and can read mega-, giga-, even petabytes of data from Amazon’s S3, Hadoop Filesystem (HDFS), or any shared / network filesystem.

val carSatImage = 

Although we execute the command on one machine, the data will be evenly loaded over all of the machines in the cluster (or cloud). The cache suffix keeps the files in memory so they can be read faster as many of our image processing tasks access the images a number of times.

1.2 Finding Cars

As our first task we want to identify all of the reflective objects, for this we - apply a threshold to the average intensity (\(\frac{red+blue+green}{3}\)) since cars are reflective and have many different colors - identify the distinct regions - analyze the area and perimeter of the regions

1.2.1 Threshold / Segmentation

So we first try to find all of the white / brightly colored cars by looking for all pixels with a mean intensity above 140

// Segment out all of the non-zero points
val pointImage = carSatImage.sparseThresh(_.intensity>140).cache

1.2.2 Identify Regions

Apply component labeling and then filter to results to only keep the car sized objects (above 100 pixels and smaller than 200 pixels)

// Label each chunk using connected component labeling with a 3 x 3 window
val uniqueRegions = ConnectedComponents.
    curLabel =>
      label.area>130 & 

1.2.3 Shape Analysis (Area and Perimeter)

We can then run a shape analysis to further analyze the objects and apply a final selection criteria, circularity, this looks at how circular an object is. A line on the street is very elongated in one direction and thus gets a low score (~0.1) while a car is substantially more circle like and gets a score between 0.3 and 0.5.

// Run a shape analysis to measure the position, area, and intensity of each identified region
val shapeAnalysis = EllipsoidAnalysis.runIntensityShapeAnalysis(uniqueRegions).
    curShape =>
      curShape.circularity > 0.25 &
      curShape.circulartiy < 0.4

We can then plot the positions, sizes, and circularities of the resulting cars.

As well as plot them back on the original map

Or a standard map which allows us to see which of the cars are on roads and which are parked.

2 Acknowledgements

Analysis powered by Spark Image Layer from 4Quant, Visualizations, Document Generation, and Maps provided by:

To cite ggplot2 in publications, please use:

H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.

A BibTeX entry for LaTeX users is

@Book{, author = {Hadley Wickham}, title = {ggplot2: elegant graphics for data analysis}, publisher = {Springer New York}, year = {2009}, isbn = {978-0-387-98140-6}, url = {http://had.co.nz/ggplot2/book}, }

To cite package ‘leaflet’ in publications use:

Joe Cheng and Yihui Xie (2014). leaflet: Create Interactive Web Maps with the JavaScript LeafLet Library. R package version 0.0.11. https://github.com/rstudio/leaflet

A BibTeX entry for LaTeX users is

@Manual{, title = {leaflet: Create Interactive Web Maps with the JavaScript LeafLet Library}, author = {Joe Cheng and Yihui Xie}, year = {2014}, note = {R package version 0.0.11}, url = {https://github.com/rstudio/leaflet}, }

To cite plyr in publications use:

Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. URL http://www.jstatsoft.org/v40/i01/.

A BibTeX entry for LaTeX users is

@Article{, title = {The Split-Apply-Combine Strategy for Data Analysis}, author = {Hadley Wickham}, journal = {Journal of Statistical Software}, year = {2011}, volume = {40}, number = {1}, pages = {1–29}, url = {http://www.jstatsoft.org/v40/i01/}, }

To cite the ‘knitr’ package in publications use:

Yihui Xie (2015). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.10.

Yihui Xie (2013) Dynamic Documents with R and knitr. Chapman and Hall/CRC. ISBN 978-1482203530

Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595

To cite package ‘rmarkdown’ in publications use:

JJ Allaire, Joe Cheng, Yihui Xie, Jonathan McPherson, Winston Chang, Jeff Allen, Hadley Wickham and Rob Hyndman (2015). rmarkdown: Dynamic Documents for R. R package version 0.5.1. http://CRAN.R-project.org/package=rmarkdown

A BibTeX entry for LaTeX users is

@Manual{, title = {rmarkdown: Dynamic Documents for R}, author = {JJ Allaire and Joe Cheng and Yihui Xie and Jonathan McPherson and Winston Chang and Jeff Allen and Hadley Wickham and Rob Hyndman}, year = {2015}, note = {R package version 0.5.1}, url = {http://CRAN.R-project.org/package=rmarkdown}, }