1 Problem: Flood Risk Assessment

For insurance companies to accurately model flood risk, primarily two sources of information are needed. First is information about flood severity, which can be derived from current and historic water location (rivers, coastlines, lakes), stores (glaciers, snowmelt), and levels (topology). Second is an analysis of vulnerability, which is based on the properties within these regions.

Property insurance firms require the ability to combine such information about natural structures with high geographic resolution. Historic information is of limited value due to limited detail of existing records (in both a spatial and temporal sense) and the rarity of flood events. Additionally climate change causes local change in weather patterns and surface modifications lead to altering water run-off pathways.

Despite a multitude of data sources, there has been a drought in actionable information and as a result flood modeling has been both uncertain and ambiguous. The uncertainty and ambiguity has tremendous financial implications as in the year 2013 alone, water-hazard-related losses amounted to nearly one trillion USD 1.

For an evidence-based risk modeling to be a reality, it is essential to have access to high-resolution data collected over the entire area of insured properties on a regular basis. With this dynamic information, it would be possible to couple the models for distribution of the flood severity with the vulnerability of the insurance portfolio and arrive transparently at a concrete, traceable value for total loss prediction. The modeled information can further be used in various analytical areas of general insurance firms such as in pricing, reserving, and capital modeling for solvency purposes.

2 Solution: Next Generation Satellite Imaging with IQAE

In parallel, the latest satellite developments from NASA, Google, Airbus Defense EADS and others provide frequent (weekly) high resolution (10m) imaging of the entire planet. These measurements result in petabytes of data generation.

Overview

2.1 Image Query and Analysis Engine

Camera to 4QL

Our tools enable large, complicated satellite imaging datasets to be processed easily and scaled across hundreds of machines.

2.2 Example

2.2.1 Locate bodies of water in the image

Here we locate and segment the water for a single day in the following images.

SELECT feature FROM (
  SELECT SEGMENT_WATER(image.blue) FROM SatelliteImages 
  ) WHERE AREA > 1000m^2 AND DATE IS 4-8-2015
Overview \(\rightarrow\) Overview

The data can also be processed over months even years worth of old images to characterize the standard changes in water levels.

SELECT DATE,AREA FROM (
  SELECT SEGMENT_WATER(image.blue) FROM SatelliteImages 
  ) WHERE AREA > 1000m^2 AND 
    DATE BETWEEN 4-8-2000 AND 4-8-2015

This information can then be summarized by year to get a better idea of the most extreme cases from a large pool

SELECT DATE.YEAR,MIN(AREA),MAX(AREA) FROM (
  SELECT SEGMENT_WATER(image.blue) FROM SatelliteImages 
  ) WHERE AREA > 1000m^2 AND 
    DATE BETWEEN 4-8-2000 AND 4-8-2015
    GROUP BY DATE.YEAR

year min.area max.area
2000 0.0006822 0.0017186
2001 0.0002175 0.0017342
2002 0.0003068 0.0015424
2003 0.0002740 0.0018010
2004 0.0001725 0.0018281
2005 0.0001795 0.0016319
2006 0.0001851 0.0018769
2007 0.0003189 0.0019106
2008 0.0007270 0.0022661
2009 0.0006729 0.0021103
2010 0.0007058 0.0020922
2011 0.0006443 0.0021316
2012 0.0008001 0.0020073
2013 0.0006278 0.0021103
2014 0.0008639 0.0022816
2015 0.0005949 0.0018625

2.2.2 Identify houses in image

SELECT feature FROM (
  SELECT SEGMENT_BUILDING(image) FROM SatelliteImages 
  ) WHERE SIZE > 100m^2 AND DATE IS 4-8-2014
Overview \(\rightarrow\) Overview

2.2.3 Identify the houses distance from the river

SELECT bd.pos,rv.distance FROM Buildings AS bd JOIN Distance(River) AS rv ON bd.pos = rv.pos
Overview \(+\) Overview

Or an interactive map for more exact examination:

Furthermore the data can be analyzed using standard tools like Excel, R, or Matlab for further quantifying and modeling of the risk.

A more detailed distribution which allows us to see that the larger more expensive buildings (the category furthest on the right) are preferentially located close to the river. This could be a major concern for future risks.

3 How

There are two interfaces available the data consumer and the data scientist.

Interfaces

3.1 Distributing the Analysis

The first question is how the data can be processed. The basic work is done by a simple workflow on top of our Spark Image Layer. This abstracts away the complexities of cloud computing and distributed analysis. You focus only on the core task of image processing.

Beyond a single train, our system scales linearly to multiple datasources and petabytes of information across hundreds of computers to keep the computation real-time.

With cloud-integration and Big Data-based frameworks, even handling an entire satellite network with 100s of satellites continuously collecting data is an easy task without worrying about networks, topology, or fault-tolerance. Below is an example for 30 different map sources where the tasks are seamlessly, evenly divided among 50 different machines.

3.2 Processing the Data

Once the cluster has been comissioned and you have the SparkContext called sc (automatically provided in Databricks Cloud or Zeppelin), the data can be loaded using the Spark Image Layer. Since for this case we load the images from a private repository stored on Amazon’s S3 block storage (also compatible with local filesystems, Google Compute, IBM SoftLayer)

val fullSatImage = 
    sc.readTiledImage[Double]("s3n://geo-images/esri-satimg/*/*.png",256,256).cache

Although we execute the command on one machine, the analysis will be distributed over the entire set of cluster resources available to sc. To further process the images, we can take advantage of the rich set of functionality built into Spark Image Layer. Watch our webinar to learn more.

3.3 Learn More

To find out more about the technical aspects of our solution, check out our presentation at the Spark Summit or watch the video.

Check out our other demos to see how 4Quant can help you

4 Acknowledgements

Analysis powered by Spark Image Layer from 4Quant, Visualizations, Document Generation, and Maps provided by:

To cite ggplot2 in publications, please use:

H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.

A BibTeX entry for LaTeX users is

@Book{, author = {Hadley Wickham}, title = {ggplot2: elegant graphics for data analysis}, publisher = {Springer New York}, year = {2009}, isbn = {978-0-387-98140-6}, url = {http://had.co.nz/ggplot2/book}, }

To cite package ‘leaflet’ in publications use:

Joe Cheng and Yihui Xie (2014). leaflet: Create Interactive Web Maps with the JavaScript LeafLet Library. R package version 0.0.11. https://github.com/rstudio/leaflet

A BibTeX entry for LaTeX users is

@Manual{, title = {leaflet: Create Interactive Web Maps with the JavaScript LeafLet Library}, author = {Joe Cheng and Yihui Xie}, year = {2014}, note = {R package version 0.0.11}, url = {https://github.com/rstudio/leaflet}, }

To cite plyr in publications use:

Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. URL http://www.jstatsoft.org/v40/i01/.

A BibTeX entry for LaTeX users is

@Article{, title = {The Split-Apply-Combine Strategy for Data Analysis}, author = {Hadley Wickham}, journal = {Journal of Statistical Software}, year = {2011}, volume = {40}, number = {1}, pages = {1–29}, url = {http://www.jstatsoft.org/v40/i01/}, }

To cite the ‘knitr’ package in publications use:

Yihui Xie (2015). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.10.

Yihui Xie (2013) Dynamic Documents with R and knitr. Chapman and Hall/CRC. ISBN 978-1482203530

Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595

To cite package ‘rmarkdown’ in publications use:

JJ Allaire, Joe Cheng, Yihui Xie, Jonathan McPherson, Winston Chang, Jeff Allen, Hadley Wickham and Rob Hyndman (2015). rmarkdown: Dynamic Documents for R. R package version 0.7. http://CRAN.R-project.org/package=rmarkdown

A BibTeX entry for LaTeX users is

@Manual{, title = {rmarkdown: Dynamic Documents for R}, author = {JJ Allaire and Joe Cheng and Yihui Xie and Jonathan McPherson and Winston Chang and Jeff Allen and Hadley Wickham and Rob Hyndman}, year = {2015}, note = {R package version 0.7}, url = {http://CRAN.R-project.org/package=rmarkdown}, }

To cite package ‘DiagrammeR’ in publications use:

Knut Sveidqvist, Mike Bostock, Chris Pettitt, Mike Daines, Andrei Kashcha and Richard Iannone (2015). DiagrammeR: Create Graph Diagrams and Flowcharts Using R. R package version 0.7.

A BibTeX entry for LaTeX users is

@Manual{, title = {DiagrammeR: Create Graph Diagrams and Flowcharts Using R}, author = {Knut Sveidqvist and Mike Bostock and Chris Pettitt and Mike Daines and Andrei Kashcha and Richard Iannone}, year = {2015}, note = {R package version 0.7}, }