You work in real estate and you need to evaluate the population in a custom area around your apartment?
You analyze the transport network of a city and want to estimate the population living below a specific distance?
You are a growth hacker and need to spot the more crowded spot to implement your new store?
Generally speaking, you want the population of any custom areas which are not administrative boundaries with an available census?
I have good news for you: this can be easily and quickly done for free using BigQuery GIS and Facebook-High-Resolution-Population-Density-Map data. By the end of this article, you will be able to compute the population on any custom shapes, in a scalable way.
For this tutorial, I will expose this method with two use cases :
- Counting the population in a 50km radius around the beautiful city of “Rennes”.
- Compute the population and the density of the districts of Paris.
Let’s compute some population!
Deep dive into Facebook High-Resolution Population Density Maps
Presentation of High-Resolution Population Density Map
Meta (previously Facebook) proposes High-Resolution Population Density Maps for a lot of countries in the world. These maps give a precise population estimation of a 30m x 30m resolution.
If you are interested, here is an explanation of how Facebook creates these High-Resolution Population Density Maps from census data and satellite images.
These Facebook High-Resolution Population Density Maps can be found on the Humanitarian Data Exchange website which proposes a bunch of open data.
For this tutorial, we will work with the French High-Resolution Population Density Map available here. Facebook proposes several maps for different population densities: women, men, children, etc. Let’s keep it simple and take the overall population density map.
The data is stored in a CSV format, in which each line corresponds to a 30m x 30 m square and its population. You will find three columns :
- latitude: The latitude of the center of the square
- longitude: The longitude of the center of the square
- square_population: The population estimation inside the square
In the rest of the article, for simplicity, we will call these 30m x 30m squares ‘points’ since we use their center points.
Integrate High-Resolution Population Density Map inside the BigQuery environment
To perform operations on this High-Resolution Population Density Map of France, we will work in the BigQuery environment which the advantages of :
- Being super efficient
- Being super scalable
- Having an integrated GIS (geographic information system)
- Being part of the Google Cloud Platform ecosystem
Another good news: if you don’t have a GCP account you can use the free version to follow this tutorial.
Let’s see how we can upload this High-Resolution Population Density Map in BigQuery.
Upload the CSV to Google Cloud Storage
Since the uncompressed CSV is pretty big (~2 Go) we have to store it first on Google Cloud Storage, which allows us to store files and use them in the other services of GCP.
Inside Cloud Store, create a new bucket by choosing a name and the region of your choice. You can let all the others parameters with default values.
In your bucket, upload the France High-Resolution Population Density Map CSV.
Create the table in BigQuery
Now go to the Big Query service UI and open your project. Create a dataset by choosing a name and the same region as your bucket:
Within your dataset, you can now create a table to upload the France High-Resolution Population Density Map CSV. To do so, choose Google Cloud Storage as a source: you should be able to find your bucket and your CSV inside. Let’s call it ‘FR_population’.
Then set the schema like this :
And finally, in the Advanced options, set the ‘Header rows to skip’ parameters to 1.
Create your table and check its preview, you should have something like that :
Congratulation, you are done now with the upload of the High-Resolution Population Density Map in BigQuery!
We can move now to the interesting part: the uses cases.
Use case 1: Compute the population of the 50km circle around Rennes.
The objective of this first use case is to estimate the number of people who live less than 50km away from the city center of Rennes.
BigQuery GIS quick introduction
To manipulate our FR_population table we need to see some of the BigQuery GIS functions. If you are interested in exploring the possibilities of BigQuery you can read its documentation. The GIS allows you to work with geographical objects such as points, lines, and polygons. They are stored in a type called GEOGRAPHY by BigQuery.
The functions that we will use are :
- ST_GEOGPOINT: Creates a point with the GEOGRAPHY type from longitude and latitude.
- ST_DISTANCE: Computes the distance in meters between two GEOGRAPHY objects.
Computation strategy and query
To compute the population in a 50km radius around Rennes we have to :
- Create a point for the center of Rennes which is (latitude: 48.117266, longitude: -1.6777926)
- Transform our latitude, longitude columns to GEOGRAPHY points
- Keep the points from our FR_population table which are distant less than 50km from the center of Rennes
- Sum the population of the kept points to get our total population.
The corresponding query is :
SELECT SUM(point_population) as population_50km
FROM `speedy-octane-332614.population_tutorial.FR_population`
WHERE
ST_DISTANCE(ST_GEOGPOINT(longitude, latitude),ST_GEOGPOINT(-1.6777926, 48.117266)) < 50000
And give us an output consistent with the 1 million population of the ‘Ille-et-Vilaine’ administrative region.
Bravo 👏 , you know now how to get the population within any range from a specific point! Now let’s step up and work with custom polygons...
Use case 2: count the population and the density of Paris districts
This second use case will illustrate how you can compute population on custom polygons. We will take the example of the Paris districts.
Retrieve Paris districts and upload it on BigQuery
The first step is to create a table of all Paris districts polygons. The source file is available on the opendata.paris website, in a CSV format.
You can then create your table in a similar way to the Facebook High-Resolution Population Density map upload but :
- You can directly upload the CSV from your local computer because the CSV is pretty light
- You can use the Auto detect option for the Schema
- ⚠️ The CSV Field delimiter is “;”
In the end, you should be able to have a table, that we will call paris_districts, with this schema :
Whose Geometry column has this preview :
The polygons are stored as strings in a geojson format. Fortunately, BigQuery GIS can support this format and we are going to see how.
The GIS functions we will need
To be able to count the population on our custom shapes we will need to use:
- ST_GEOGFROMGEOJSON: This function will transform our polygon in GeoJson text to a GEOGRAPHY object
- ST_CONTAINS: This function returns true if the first GEOGRAPHY object contains the second one.
- ST_AREA: This function returns the area of a GEOGRAPHY object
The strategy and the query
To be able to compute the population and the population density for each of the Paris districts we will :
- Transform our GeoJson text polygons to GEOGRAPHY object with ST_GEOGFROMGEOJSON
- Transform our latitude, longitude columns from the FR_population table into GEOGRAPHY points
- Join FR_population with paris_districts to keep population points inside the correct Paris district using the ST_CONTAINS function
- Grouping our data by districts computing the population by summing the population_point column, and the density by dividing by the area of the corresponding district
The corresponding query is
SELECT
paris_districts.L_QU as district_name,
SUM(FR_population.point_population) as district_pop,
SUM(FR_population.point_population) / ST_AREA(ST_GEOGFROMGEOJSON(paris_districts.Geometry)) * 1000000 as district_density,
FROM `speedy-octane-332614.population_tutorial.FR_population` as FR_population
JOIN
`speedy-octane-332614.population_tutorial.paris_districts` as paris_districts ON
ST_CONTAINS(
ST_GEOGFROMGEOJSON(paris_districts.Geometry),
ST_GEOGPOINT(FR_population.longitude, FR_population.latitude)
)
GROUP BY paris_districts.L_QU, paris_districts.Geometry
ORDER BY district_pop DESC
That gives you the districts with their population and their density, ordered from the most populated to the less.
As an example, Saint-Lambert's result is compatible with the official figure.
Visualize it with BigQueryGeoViz
If you want to share your geospatial results beautifully, BigQueryGeoViz is an amazing tool.
You only have to connect with your Google account, select the project ID you work with, and then you will be able to run some queries directly in the interface.
For example, we are going to display the above colored map using the following query :
SELECT
paris_districts.L_QU as district_name,
ST_GEOGFROMGEOJSON(paris_districts.Geometry) as district_polygon,
SUM(FR_population.point_population) / ST_AREA(ST_GEOGFROMGEOJSON(paris_districts.Geometry)) as district_density,
FROM `speedy-octane-332614.population_tutorial.FR_population` as FR_populationJOIN
`speedy-octane-332614.population_tutorial.paris_districts` as paris_districts ON
ST_CONTAINS(
ST_GEOGFROMGEOJSON(paris_districts.Geometry),
ST_GEOGPOINT(FR_population.longitude, FR_population.latitude)
)
GROUP BY paris_districts.L_QU, paris_districts.Geometry
Once the query ran, you can customize your display style with the following panel :
Congratulation! You are now able to :
- Compute population on any custom shape
- Compute density on any custom shape
- Make quick visualization to check and share your work
Conclusion
I hope this tutorial has been useful for you and that you will find amazing usage of this fabulous data from Facebook High-Resolution Population Density Maps. Note that BigQuery GIS has many other functionalities that can help you to leverage your geospatial data in a very scalable way.
Moreover, BigQuery has several public datasets, directly available from the UI: so check this out!
Are you looking for Data Engineering Experts? Don't hesitate to contact us!
If you liked this article, you might be interested in monitoring cloud carbon footprint or make Mlops with terraform