For insurance companies to accurately model flood risk, primarily two sources of information are needed. First is information about flood severity, which can be derived from current and historic water location (rivers, coastlines, lakes), stores (glaciers, snowmelt), and levels (topology). Second is an analysis of vulnerability, which is based on the properties within these regions.
Property insurance firms require the ability to combine such information about natural structures with high geographic resolution. Historic information is of limited value due to limited detail of existing records (in both a spatial and temporal sense) and the rarity of flood events. Additionally climate change causes local change in weather patterns and surface modifications lead to altering water run-off pathways.
Despite a multitude of data sources, there has been a drought in actionable information and as a result flood modeling has been both uncertain and ambiguous. The uncertainty and ambiguity has tremendous financial implications as in the year 2013 alone, water-hazard-related losses amounted to nearly one trillion USD 1.
For an evidence-based risk modeling to be a reality, it is essential to have access to high-resolution data collected over the entire area of insured properties on a regular basis. With this dynamic information, it would be possible to couple the models for distribution of the flood severity with the vulnerability of the insurance portfolio and arrive transparently at a concrete, traceable value for total loss prediction. The modeled information can further be used in various analytical areas of general insurance firms such as in pricing, reserving, and capital modeling for solvency purposes.
In parallel, the latest satellite developments from NASA, Google, Airbus Defense EADS and others provide frequent (weekly) high resolution (10m) imaging of the entire planet. These measurements result in petabytes of data generation.
Our tools enable large, complicated satellite imaging datasets to be processed easily and scaled across hundreds of machines.
Here we locate and segment the water for a single day in the following images.
SELECT feature FROM (
SELECT SEGMENT_WATER(image.blue) FROM SatelliteImages
) WHERE AREA > 1000m^2 AND DATE IS 4-8-2015
The data can also be processed over months even years worth of old images to characterize the standard changes in water levels.
SELECT DATE,AREA FROM (
SELECT SEGMENT_WATER(image.blue) FROM SatelliteImages
) WHERE AREA > 1000m^2 AND
DATE BETWEEN 4-8-2000 AND 4-8-2015
This information can then be summarized by year to get a better idea of the most extreme cases from a large pool
SELECT DATE.YEAR,MIN(AREA),MAX(AREA) FROM (
SELECT SEGMENT_WATER(image.blue) FROM SatelliteImages
) WHERE AREA > 1000m^2 AND
DATE BETWEEN 4-8-2000 AND 4-8-2015
GROUP BY DATE.YEAR
year | min.area | max.area |
---|---|---|
2000 | 0.0006822 | 0.0017186 |
2001 | 0.0002175 | 0.0017342 |
2002 | 0.0003068 | 0.0015424 |
2003 | 0.0002740 | 0.0018010 |
2004 | 0.0001725 | 0.0018281 |
2005 | 0.0001795 | 0.0016319 |
2006 | 0.0001851 | 0.0018769 |
2007 | 0.0003189 | 0.0019106 |
2008 | 0.0007270 | 0.0022661 |
2009 | 0.0006729 | 0.0021103 |
2010 | 0.0007058 | 0.0020922 |
2011 | 0.0006443 | 0.0021316 |
2012 | 0.0008001 | 0.0020073 |
2013 | 0.0006278 | 0.0021103 |
2014 | 0.0008639 | 0.0022816 |
2015 | 0.0005949 | 0.0018625 |
SELECT feature FROM (
SELECT SEGMENT_BUILDING(image) FROM SatelliteImages
) WHERE SIZE > 100m^2 AND DATE IS 4-8-2014
SELECT bd.pos,rv.distance FROM Buildings AS bd JOIN Distance(River) AS rv ON bd.pos = rv.pos
Or an interactive map for more exact examination:
Furthermore the data can be analyzed using standard tools like Excel, R, or Matlab for further quantifying and modeling of the risk.
A more detailed distribution which allows us to see that the larger more expensive buildings (the category furthest on the right) are preferentially located close to the river. This could be a major concern for future risks.
There are two interfaces available the data consumer and the data scientist.
The first question is how the data can be processed. The basic work is done by a simple workflow on top of our Spark Image Layer. This abstracts away the complexities of cloud computing and distributed analysis. You focus only on the core task of image processing.
Beyond a single train, our system scales linearly to multiple datasources and petabytes of information across hundreds of computers to keep the computation real-time.
With cloud-integration and Big Data-based frameworks, even handling an entire satellite network with 100s of satellites continuously collecting data is an easy task without worrying about networks, topology, or fault-tolerance. Below is an example for 30 different map sources where the tasks are seamlessly, evenly divided among 50 different machines.
Once the cluster has been comissioned and you have the SparkContext called sc
(automatically provided in Databricks Cloud or Zeppelin), the data can be loaded using the Spark Image Layer. Since for this case we load the images from a private repository stored on Amazon’s S3 block storage (also compatible with local filesystems, Google Compute, IBM SoftLayer)
val fullSatImage =
sc.readTiledImage[Double]("s3n://geo-images/esri-satimg/*/*.png",256,256).cache
Although we execute the command on one machine, the analysis will be distributed over the entire set of cluster resources available to sc
. To further process the images, we can take advantage of the rich set of functionality built into Spark Image Layer. Watch our webinar to learn more.
To find out more about the technical aspects of our solution, check out our presentation at the Spark Summit or watch the video.
Check out our other demos to see how 4Quant can help you
Analysis powered by Spark Image Layer from 4Quant, Visualizations, Document Generation, and Maps provided by:
To cite ggplot2 in publications, please use:
H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.
A BibTeX entry for LaTeX users is
@Book{, author = {Hadley Wickham}, title = {ggplot2: elegant graphics for data analysis}, publisher = {Springer New York}, year = {2009}, isbn = {978-0-387-98140-6}, url = {http://had.co.nz/ggplot2/book}, }
To cite package ‘leaflet’ in publications use:
Joe Cheng and Yihui Xie (2014). leaflet: Create Interactive Web Maps with the JavaScript LeafLet Library. R package version 0.0.11. https://github.com/rstudio/leaflet
A BibTeX entry for LaTeX users is
@Manual{, title = {leaflet: Create Interactive Web Maps with the JavaScript LeafLet Library}, author = {Joe Cheng and Yihui Xie}, year = {2014}, note = {R package version 0.0.11}, url = {https://github.com/rstudio/leaflet}, }
To cite plyr in publications use:
Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. URL http://www.jstatsoft.org/v40/i01/.
A BibTeX entry for LaTeX users is
@Article{, title = {The Split-Apply-Combine Strategy for Data Analysis}, author = {Hadley Wickham}, journal = {Journal of Statistical Software}, year = {2011}, volume = {40}, number = {1}, pages = {1–29}, url = {http://www.jstatsoft.org/v40/i01/}, }
To cite the ‘knitr’ package in publications use:
Yihui Xie (2015). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.10.
Yihui Xie (2013) Dynamic Documents with R and knitr. Chapman and Hall/CRC. ISBN 978-1482203530
Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595
To cite package ‘rmarkdown’ in publications use:
JJ Allaire, Joe Cheng, Yihui Xie, Jonathan McPherson, Winston Chang, Jeff Allen, Hadley Wickham and Rob Hyndman (2015). rmarkdown: Dynamic Documents for R. R package version 0.7. http://CRAN.R-project.org/package=rmarkdown
A BibTeX entry for LaTeX users is
@Manual{, title = {rmarkdown: Dynamic Documents for R}, author = {JJ Allaire and Joe Cheng and Yihui Xie and Jonathan McPherson and Winston Chang and Jeff Allen and Hadley Wickham and Rob Hyndman}, year = {2015}, note = {R package version 0.7}, url = {http://CRAN.R-project.org/package=rmarkdown}, }
To cite package ‘DiagrammeR’ in publications use:
Knut Sveidqvist, Mike Bostock, Chris Pettitt, Mike Daines, Andrei Kashcha and Richard Iannone (2015). DiagrammeR: Create Graph Diagrams and Flowcharts Using R. R package version 0.7.
A BibTeX entry for LaTeX users is
@Manual{, title = {DiagrammeR: Create Graph Diagrams and Flowcharts Using R}, author = {Knut Sveidqvist and Mike Bostock and Chris Pettitt and Mike Daines and Andrei Kashcha and Richard Iannone}, year = {2015}, note = {R package version 0.7}, }