Solving Business Problems with Geospatial Analytics and Data Visualization
Shan He
Senior Director of EngineeringDemystifying Business Problems with Geospatial Analytics and Data Visualization
Welcome to our blog! It’s a thrill to have Shan He, Director of Engineering at Foursqaure, explain how geospatial analytics and data visualization help in addressing business problems. Interestingly enough, Shan's expertise lies in the intriguing intersection of design and software engineering, particularly in data visualization.
From Architect to Engineer
Shan spent a significant part of her early career studying architectural design but her growing interest in software engineering led her down a different path – data visualization. After a fruitful five-year tenure at Uber, Shan kick-started her own open-source library, Kepler GL, which focuses on geospatial visualization. This foundation led Shan to develop her own company known as Unfolded which was later acquired by Foursquare.
Geospatial Data Visualization: A Challenge Worth Taking On
It's intriguing how geospatial data analytics and visualization bring virtual form to abstract numbers, making them intensely effective and personal. The beauty of this science is that it enables nuanced visualization of complex datasets, like millions of buildings in the US, their density, and in certain cases, it lights up dark spots in analysis. It’s this very ability of geospatial visualization in making abstract numbers a lot more consumable that excites Shan.
Working with Foursquare
Foursquare, as a provider of location data and technology, has bolstered Shan's journey in geospatial data visualization. One of the tools she built while at Foursquare is Foursquare Studio, akin to Photoshop for geospatial data. It aids in analyzing and visualizing large-scale geospatial data in your browser. The ultimate goal of such a tool is not only about creating appealing maps but also about helping make sense of data and thus making decisions actionable.
Embracing Challenges of Geospatial Data
Geospatial data presents unique challenges. Most of the data are of planetary scale with a time dimension attached to them. Owing to different collection methods, the data also come in various formats. These obstacles demand novel ways of analyzing data which not only has a precise two-second dimension but also shows human movement in the environment.
Solving Geospatial Issues with a Unified Grid System
One potential solution to these challenges would be employing a unified grid system for geospatial analytics. This is where H3, the indexing system, shines - it provides a global standard grid system. This system enables the partitioning of data and optimizes storage, processing, and analysis. Once successfully converted into a unified grid, navigating through different data points and correlating through different data formats becomes a smooth task.
Introducing HeXagon Tiles
Despite achieving unified analysis, the question of handling planetary-scale data remains. Regularly, when data comes in billions, it could still result in millions or billions of rows even after being aggregated into different H3 scales. It's at this juncture that Hexagon tiles, dubbed Hexcells, come into the picture. Hexcell, powered by H3 indexing, organizes data into different resolutions of hexagons and loads only those data points relevant to the current analysis.
Transforming Geospatial Data into Actionable Insights
With Hexagon tiles in place to convert raw data into Hexagons, the next step is Geo-Transform. It bolsters the conversion of raster lines, points, and boundary data into Hexagons for analytics. This transformation makes unified data analysis possible and allows one to answer complex analytical questions, for example, about people living in lower elevations affected by flooding or those at higher altitudes impacted by traffic noise.
A Practical Case Study
To understand the practical application of this technology, let's delve into a case study. Shan used Foursquare Studio to analyze how rising sea levels might affect the world's population. By loading global population density data and global elevation data, both in the form of Hex tiles, Shan could conduct a joint analysis. This included creating summary charts of population and maximum elevation, filtering Hexagon cells by elevation, and determining how many people live below a specified elevation.
Diverse Application of Geospatial Data Analytics
As evident from the case study, geospatial data analytics can be instrumental in planning for future events. Governments would greatly benefit from this technology to plan various aspects of infrastructure, demography, emergency responses, and many other fields. Several platforms, apart from Foursquare, have incorporated open-source technology offering similar functionalities, some examples include Kepler GL and Carto.
In Summary
It's safe to conclude that geospatial data analytics and visualization play a crucial role in transforming raw data into actionable insights. As exemplified by Shan's journey, from being an architect to a software engineer and her work at Foursquare, it is clear that the world of geospatial data analytics is a fascinating one.
Video Transcription
OK, let's start. Hello everyone. My name is Shan. He, I'm a director of Engineering at Foursquare. And today I will give a presentation about, you know, how do we solve business problems with geospatial analytics and data visualization? A little bit about my background.Um I spent seven years in architecture um studying architecture design, but my interest is always in the joint field of design and software engineering. So I developed my career along the line of uh data visualization. I spent five years at Uber working on data visualization tools.
And then I opened my own I open source, my own library called Kepler GL spec specifically focusing on geospatial visualization. Uh After that, I started my own company called Unfolded uh based on Kepler GL. Uh It is a geospatial visualization tooling. Everything is geospatial and after a while we got acquired by Foursquare. So now we're at Foursquare and continually working on the geospatial tools. So uh why I'm so, you know, excited about geospatial data analytics and visualization. You know, I'm a and my background is in design.
Every time when I'm looking at numbers, I always want to understand what is the, you know, it's now excites me and what excites me is actually putting these numbers onto the map and then representing them with a small dot So you can see when we turn abstract numbers in something visible.
Um The number becomes a lot more powerful than it is like this is showing 100 and 25 millions of buildings in the US. You know, we can see where the people res reside, see the highlights of all the densities and then the light spot lighting up this map make the 125 million number a lot more, you know, personal to us. That's why I'm always like uh very invested into geospatial visualization.
It is something that turns the abstract number into uh into a lot more consumable insights. So at four square um you know, geospatial data visualization brings me to Foursquare. Foursquare is a company that investing in um providing location, data and technology to help business uh understand um better grow their, grow, grow their practice. So Foursquare has uh da geospatial data products like place system visits. It also has uh the tools that business can use to understand uh and analyze the large scale geospatial data. One of the tools that um I built previously I folded now at Foursquare is called Foursquare studio. Um I build this um you know, kind of Photoshop like tool for geospatial data. Um It is for analyzing and uh visualizing large scale of geospatial data in your browser. The goal of this is definitely not just, you know, draw something pretty map, but the goal of this actually help people make sense behind the pretty pixels, uh help them make sense of the data and help them make actionable decisions. Um Some screenshot of uh you know, with Foursquare Studio, um we can we usually be able to handle data in many different formats.
Uh One of the format that you know, just um kind of become relevant in recent years is the data with uh the GP the GPS data points because of the omnipresence of smartphones and tracking devices. Nowadays, a lot of the data we're trying we are collecting what what you learn to make, make uh make sense of. Um all are in the, you know, in the magnitude of terabytes and, and gig uh terabytes and global scale. And because the data we collect these days uh highest time stamp and then move to it. The traditional ana analytic platform which usually designed to dealing with, you know, data that does not change in the sub seconds are not uh has not become, has become obsolete. So we need a new way to analyze this type of data that has the precise 2 m dimensions that has the precise two seconds um dimensions to it. So a lot of these data is actually about humans about how human move around and navigate through the building environment. Uh An example of this data we we uh work with at four square are this is uh you know, millions of places, data in the UK. Now places is not what you and I understand, you know, places not, it's not just about address, places actually about you know, the restaurants, the little corner shop, the coffee shop, I like to go to a places data that help us understand how, you know, we develop our cities, how we should move into a new area, uh new area of development.
So places data also changes, right? Because you know coffee shops close and new restaurant opens. And this is data. This type of data oftentimes uh needs constant refresh. And that's what the time dimension comes in when we have to deal with data like this, this is another type of data we work with. Um this is built with um population data of the entire world because we data set like this are you know, billions of data points in it. We will need a way to be able to actually aggregate them and showing density in kind of a aggregated manner. Um So that becomes another type of challenge we always have to deal with. Oh, in summary, we use geospatial data and visualization to understand how humans or other animals moves and navigate around the environment. Uh Like the data I just showed you there's many other ana analysis we usually do with geospatial data. Um We use travel data to understand connectivities, bet uh between different city areas. We use building foot points to understand infrastructure. We use satellite imagery to conduct earth signs.
And um you know GPS data help us to understand the navigation or better navigate the city and you know, um uh animals uh population data help us understand density uh um of our cities. So oftentimes understanding this data presents great challenge. Um more like I showed before, a lot of the data are planetary scale. A lot of the data has time dimension to it. And then because the way we collect the data different, either through a mobile phone device or through sr imagery, a lot of the data also comes in many different formats. So uh these are the challenges that present to us when we are working with geospatial data. Some of the we, you know, obviously there's many different ways to, to uh or uh solutions that we came up with to tackle these area individually. One of the solutions I want to talk about today is um performing geospatial analytic in a unified grid system. Um This has become, I think in the recent years has become a pretty uh hot topic just because everybody has to deal with billions of data points and then looking through each individual data point doesn't really give you um you know, a straightforward uh analysis tools. Usually we want a way to unify these data points into a grid, uh geospatial grid So that's where um H three become relevant H three is the indexing system. It gave us a global standard grid system that, you know, in the shape of mostly hexagons, sometimes pentagons.
So we can use H three, the indexing system to partition our data, uh optimize the storage processing and analysis. Um With the help of H three, we can then convert the different type of data that we usually get from, you know, either being points of interest or being uh maps and navigation or being sensor, which usually comes in uh geo fences or boundaries, we can convert all of, we can convert all of these different data formats into a unified grid.
And then at that, by doing that, we'll be able to, you know, smoothly navigating through different data points and, and correlate through different data formats. So uh with H three, we can solve the problem of unifying analysis, but we still need a way to solve, solve the problem of how do we handle data as planet planetary scale. Usually when data comes in billions of data point, even aggregating into different H three scales depends on your resolution can sometimes still result in millions or billions of rows. So that's when he cell become uh become the next solution that we build on top of H three.
So heel is basically as it, as it sound a talent system based on Hexagons, we use H three to indexing our data into different resolution of H of hexagons and using tiles to load them so that we only load the data points that's relevant to our current analysis, what the, the cells that's relevant to our current analysis areas.
And because H three has its hierarchical nature, so uh at different resolution, we can load different different tiles. So we never really um you know, overflow our browser or or analytic platform. So with, with telling what it, it's important to understand the the concept of tiling is that tiling is hierarchical. So if you zoom all the way out in a country level, you can load tiles at, you probably want to understand only need to load tiles at miles of radius. But if you are want to study like a cityscape, you probably want to understand, you know, you want to load tiles at let's say 100 m. So with tiles will be able to hierarchically break this data into different resolutions and only load the trunks that's relevant to our analysis. So um and but why Hexagon tiles, why not other shape of tiles? You know, there are tiles are based on squares, there are tiles based on uh triangles. Uh Hexagon is specifically um it's, it's specifically designed so that the shape of the of Hexagon is unified.
It's, it's good for unifying big data and it's good for performing aggregation across different resolutions and because it's its size, it's optimized for so and processing visualizing and sharing, that's why we pick Hexagons. Um um I mean, obviously there are a lot of lectures around why Hexagons.
So today I didn't go into it, but it is very interesting as I recommend if I'm interested to read about H three and Hexagons. So how does it actually work in the end? So when we are looking at, so this is our, how we be uh you know, looking at every single parameters or uh GPS data, data points on every single road in the in uh New York areas, obviously, data is billions of rows, but we'll be able to aggregate them at different resolution into this Healon cells.
So as you can see when we zoom in, we load high resolution data, but when we zoom out, we lo we load co resolution data um doing that we can uh we can avoid of overflowing our browser, right? And that I think in this map, the finest resolution is hex 14 which is only 6 m, but the closest resolution is at hex seven or hex eight, which is a kilometer range. So, you know, this is just to demonstrate how the hierarchical modeling of hex hexagon or hex tiles can help us breaking da data into different resolutions and load them on demand. Now, after explaining Hexagons, obviously, the next thing we need to understand is how do we actually converting these data points into Hexagons? That's where GEO transform comes in handy because our data comes in raster lus points and polygons, we need a way to break them into hexagons or aggregate them into hexagons. And um at, at foursquare, we came up many different algorithms to work with different shapes. Um be no ma uh be it the uh poly lines, ras points or uh points of interest or I mean boundary, we always have a way to convert them into hexagons for analytics. Um And, and why does it matters? Right.
So for example, when we are looking at data comes from rosters uh come from a boundary, come from GPS. This is show this is how we can convert them into this unified cell um shape. So from raster in this data set contains elevation data, we convert them into this uh 100 m radius hexagons from any I mean boundary. We uh which contains a population density data. We also convert them into this unified shape of hexagons. And then another data set contains all the noise complaint uh complaints in New York City, these data that comes in raw GPS points, we convert them into hexagons all in all. Um oh in o we will be able to convert everything into hexagons. So and I use case study looking at this is that you know, this is our, this is the U US sensor demographic data set. It comes in census tracks which are these very un unified shapes, you know, sensor track is drawn by different administrations. So they are completely different shapes. So if we want to make analysis on this, we're combining this type of shape of data with other data sets or GPS. It's, it's gonna make analysis and even be because the boundaries will uh skewer the actual uh the actual numbers. When we convert this data set into hex tiles, uh it's going to look like this, right? We, we no longer see boundary, we can see the data point, a lot clear clear. This is a this is a census data set. Uh We convert it into uh a hexagons.
And at a higher level, you can see more clearly where the actual dens high spot or low spot is in the entire United States, right? Most of the population are um are uh are higher density along the east coast or the west coast. And the in the, you know, central area, you can see the population density actually follows along the, you know, uh uh the highway, you can see the highlights of high rate data points. And then once you zoom into this map, you can see how the the uh cells become higher resolution as you zoom in because we only load the area you're currently looking at. Uh as you zoom out, you can see the cells become coarser resolutions, but it's still visible uh clearly to your eyes to see where the den the higher density or the lower density is as we like to look at the central part of this map because you can see how the higher density is always have this kind of a dot Connective dot Patterns.
This is, this is all the small towns we developed along the highways because American is, uh has, you know, the world's most advanced highway system and our population is actually, you know, uh kind of uh naturally spread out at the different highway intersections. A very interesting man, I like to show and in the end, you know, after be able to break that data down into hex piles, after be able to unify them into different shapes. The at the end, we want to be actually draw analysis along this data sets into in into this unified shapes, right? Once we convert data with noise population elevation or all in one an analysis unit, we can start answering questions like you know, how many people actually live in lower elevations that might get affected by flying or does people actually live on higher elevation affected by a traffic noise?
So finally, I want to show a case study of how this all tied up together. One of the, you know, within your our tool, I want to answer the questions that how will rising ra uh rising sea level affect world's population. So this is showing, you know, this is using uh four square studio. It's a two way developed. I'm loading two sets of hex styles into the session on the right side. Uh I'm loading the global population density data set uh built as he tiles on the left side, I'm loading the elevate uh the global elevation data set um also in hex house. So the next thing I'm trying to do because if, if I want to understand how population will get affected by uh you know, raising sea level, I need to be able to join this two data set together. So I'm using the joint function here. Uh It's similar to basic table joint on the left. I want to join population on the right. I will pull my elevation after I hit join, I have this new data set that's you know, joined between the uh join these two together. So I can have my individual data set looking at this joint data set in this joint data set. I now have both population and elevation columns uh into embedded in each cell. So um now we can start, you know, playing around with some statistics.
So first thing um what I did is that if I want to understand the population affected by raising sea level, I need to create a summary chart of how many how much res populations actually seeing in my current map. So I create a number based on my joint data set, showing total number of population and then you know, create another charts showing current uh maximum max sea uh max elevation. The next thing I did is that I started to add a filter. Um I started filtering my Hexagon cells by elevation. So if I turn my filter down to zero, so I'm only looking at populations that living below zero elevation. Right? It's very interesting. You will see the Caspian Sea Park Glia because those areas actually, uh you know, below sea level and I can scroll around to see once I raise my maximum elevation, I can see more, more area being lighted up all in all. I in the end, I changed my number to be five because I want to understand if sea level rising by 5 m, how many people will get affected? And number being 7 million and I can move around to see where the actual 7 million people lived. So um yeah, so that's my demo or my talk. I don't want to end in a very desperate note that's saying that, you know, sea level rise by 5 m, 10 million people will be affected.
But I hope this uh talk, give you a very high level overview what geospatial analytics is all about and how we're using Tylenol system to solving this problem. So thanks everyone and I hope uh this is interesting for you all. And if you need to contact me, my name is Shan and this is my contact info. Um So we're running over time. I do see two questions. Um One of being I can imagine that the government will benefit from this information. Do you know any government that is already using this kind of technology? Uh for what purpose? Um We uh at Foursquare or actually let me, let me talk to two things. I've been open sourcing. Capital gr for many years. It's a technology that's built, uh you know, uh we built Foursquare studio on top of and I know capital GR has been used because it's open source. I can I know it's been used by government uh even outside us in Europe. Uh because uh you know, uh geospatial data science is not a new domain. Uh government can use this data to plot and any type of geospatial data. They collected a census being one the map I just show and I there's obviously many purpose and understanding geospatial data. One of being, you know, understand what people live with census data where the higher population is a, I don't know exactly. There's many purposes I just cannot narrow down on one.
Uh is Foursquare the only platform that can handle this kind of visualization, who else is working with this technology? Uh Like as I said, the technology is open source, we open source, this capital Gr many years ago. Uh four square studio is definitely the commercial version of the open source technology that we built on top. I know there are other platform like Carto is also um using open source technology. So you know, first grade is definitely not the only one, but we're the one actually open source, uh, the core technology um, as to the people that has been working on this. Yep. Um, sorry for running uh, longer than time. And then I will, if there's no more questions, I'm gonna end the meeting for everyone. And again, if you are interested in contact me and the four square dot com or my tutor handler is Yan underscore C er I, all right, thanks everyone.