tl;dr: This is a proof of concept for performing complex spatial operations (point in polygon aggregation) in the browser using WASM. On my laptop I can aggregate 13 million points to about 38,000 polygons in 21 seconds (results may vary, either faster or slower, depending on the hardware you use).
Try it live here: https://stuartlynn.github.io/wasm_geo_agg/
Wasm Geo Agg is a proof of concept to explore performing complex geospatial operations in the browser using Rust and WebAssembly. As an initial test, we are focusing on point in polygon operations. Simply load in a CSV file with points and a GeoJSON file with polygons then click aggregate.
If you dont happen to have a bunch of geospatial datasets hanging around your computer here are a few suggestions of things to try
- Point data: Street trees
- Polygon data: Census Blocks
- Point data: Taxi data for 2011 - 08
- Polygon data: Census Blocks
- Point data: 311 complaints
- Polgon data: Census Blocks
Suggest others in an issue and I will be happy to add them!
Currently, if you want to process geospatial data you can either
- Spend a day or two installing a bunch of really amazing tools like GDAL, PostGIS, QGIS etc and banging your head a few times as you try to get all their versions compatible with each other ( not to mention trying to not blow up your python installation as you go)
- Learn Python or R and use packages like geopandas
- Upload your data to services like ArcGis or CARTO to be stored and processed in the cloud somewhere.
Options 1 or 2 let you process data locally but have a steep learning curve. As someone who has been working in the geospatial world for 4+ years, I still lose half a day each time I need to install a new geospatial stack. While using something like docker makes this a little easier, that too has a bit of a learning curve.
Option 3 means that you to some extent hand over control of your data to a third party. If the data is sensitive and needs to remain local (as is true for a lot of non-profits or research data), or if you need a service that can be guaranteed to still be around in 5-10 years, these options might not be ideal either. Another consideration is that the cloud servers that process the data on these services are often less powerful than the laptop you are using to read this article, which increasingly seems insane to me.
So this is an experiment exploring a 4th option. To ask: what if we had a PostGIS that ran entirely in your browser? A system that uses the web to deliver sophisticated software to your computer in the form of javascript and WASM with zero installation overhead, that then processes your data locally using the powerful CPU that happens to live in your machine.
In years gone by, javascript was the only option to process anything in the browser. If this was still true then we would struggle to do mid to large scale geospatial processing in the browser. However the arrival of WebAssembly means that we can do near native processing. In this POC we use Rust and the amazing toolchain that is wasm-pack, wasm-bindgen and parcel, to stitch together a number of Rust packages that perform the calculation and then make them available to javascript.
As mentioned, on my personal laptop, it takes about 21 seconds to aggreate 13 million points to 38,000 polygons, which is pretty mind blowing considering this is all happening in the browser.
Beyond adding the R-Tree index, I haven't done much to optimize the code so I suspect this could be made even faster. Improvements that I suspect are low hanging fruit:
- Making loading faster. The way I am chunking up the files to pass to Rust are not optimal I don't think. Currently it takes a while to load the data in. I am sure this could be improved
- Rendering. The rendering code in there is super simple and was mainly just so I could verify that the data was loading and aggregating properly. The point rendering takes some inspiration from the python datashader library while the polygon visualiser is just dumbly drawing paths on a 2D canvas. Both could be improved by some WebGL I suspect. Be interesting to see if this was easily integratable with kepler.gl
- Numerical aggregations. Currently the way non spatail data is stored and accessed is pretty dumb. I think moving to something like apache arrow could speed this up.
- Parallelize the code. Not sure how mature it is but WebAssembly should have multi threaded support through web workers soon. It’s possible this could be used to speed up multiple parts of the POC.
Yeah I was very much still learning Rust and WASM while I wrote this. The plan is to refactor and make the code much more general in the next iteration. As a POC though I think it's worth putting this out there for others to see and as a call for anyone who is interested in developing geospatial tools this way to join efforts!
Also just think how much faster better this can be when someone more competent attempts the same thing.
- Install rustup: https://rustup.rs/
- Install wasm-pack: https://rustwasm.github.io/wasm-pack/installer/
- Install yarn: https://yarnpkg.com/lang/en/docs/install/#debian-stable
- Clone this repo:
git clone https://github.com/stuartlynn/wasm_geo_agg.git
- Install dependencies
cd wasm_geo_agg; yarn
- Run
yarn start
Let me know in the issues if you have problems getting it running
Pull requests and issues are more than welcome to this code. You can also ping me on twitter if your interested in thinking this kind of approach through more generally.
The code is split into two main directories:
- src: Contains the javascript code which is written in React. This drives the UI and calls out to the WASM functions through wasm-bindgen.
- wasm: Contains the Rust code that compiles to WebAssembly that does the heavy lifting.