Skip to content

feat: Switch geodata providers #7393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

camdecoster
Copy link
Contributor

@camdecoster camdecoster commented Mar 20, 2025

Description

Adds scripts to build topojson from UN sourced data and updates build process to run these scripts.

Closes #7334

Changes

  • Adds script for downloading shapefiles
  • Adds script for converting shapefiles into topojson
  • Updates topojson path references

Testing

  • Run npm run build_topojson and make sure the script completes successfully
  • Try loading the maps in ./dist/topojson in Mapshaper
  • Open the test dashboard
  • Try loading some of the geo plots and look for errors

Notes

  • The new maps are built from this UN data source for countries, coastlines, and lands layers. The oceans, lakes, rivers, and subunits layers are built from Natural Earth Data.
  • The current maps come from sane-topojson, which is entirely derived from Natural Earth Data
  • I haven't been able to verify the resolution of the data from the UN. There's a reference here to the "UNGeodata 25 million scale", but I haven't been able to confirm what that means. As such, the labels '50m' and '110m' probably aren't accurate. But they need to remain as they are to not break references.
  • Mapshaper can't handle antimeridian cutting, so features that cross the antimeridian can look weird (source)
  • I want to credit sane-topojson for some of the design of these scripts. I'm attempting to create a (mostly) drop in replacement, so it made sense to follow those conventions.

TODO

  • [ ] Clean up shapefiles after converting to topojson Saving everything into the build folder makes this unnecessary
  • Fix Antarctica maps (or just remove them)
  • Commit new maps to ./dist/topojson folder
  • Fix broken tests (I'm sure there will be some)
  • Remove log statements in process_geodata.mjs (or hide them behind flag)
  • Add the 'usa' map (though it will be pretty bland without states)
  • Remove sane-topojson package
  • Verify resolutions
  • Only include actual coastlines in coastlines
  • Make sure 110m maps don't have any unintended artifacts (like South America)
  • Add UN geodata archive to repo and only attempt download if file can't be found
  • Add markdown log file in draftlogs: 7393_feat.md
  • Remove unnecessary properties info from final maps

@gvwilson gvwilson added feature something new P1 needed for current cycle labels Mar 20, 2025
@camdecoster camdecoster changed the title Switch geodata providers feat: Switch geodata providers Mar 20, 2025
import config from './config.json' assert { type: 'json' };

try {
// Download data from UN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As these origin URLs may change (link rot), I worry that this piece of the script could become out of date quickly. I'm not sure there's too much we can do here but I might recommend committing the current source file (but not adding it to the dist/. Maybe someone else has a better idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable. The script could be updated to look for the file first and only download if it doesn't exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marthacryan do you have an opinion or ideas on this? What's the ideal dev UX here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, in 3b47c67 I added the archive from the UN.

@ndrezn ndrezn requested a review from emilykl March 20, 2025 17:03
@gvwilson
Copy link
Contributor

  1. Sync with this branch.
  2. npm i to install/update packages.
  3. npm run build_topojson to build JSON files.
  4. Go to mapshaper.org.
  5. Load europe_50m.json.
  6. Initially looks like the lakes are misplaced (for example, the Caspian Sea seems to overlap the Russian border in a weird way in the screenshot).
  7. On closer inspection, maybe this is valid: I'm unable to find a map showing how national borders cross the Caspian Sea, and Russia might very well have a claim to the northwest section.
  8. The three lakes in the north check out: one is in Sweden, the other two are near St. Petersburg.
    Screenshot 2025-03-25 at 9 58 13 AM

@gvwilson
Copy link
Contributor

Q for @camdecoster : did the old usa map have the states? as you say, it's not much use without that…

@gvwilson
Copy link
Contributor

Trying to get npm run test-jasmine to work with the new topojson files, it looks like I'm tripping over a configuration issue.

  1. Original error message complained about not being able to find ../../../dist/topojsonworld_110m.json (note the missing '/' between 'topojson' and 'world').
  2. I modified getTopojsonPath in src/lib/topojson_utils.js as follows to insert the '/' if required:
 topojsonUtils.getTopojsonPath = function(topojsonURL, topojsonName) {
-    return topojsonURL + topojsonName + '.json';
+    if (topojsonURL.endsWith('/')) {
+       return topojsonURL + topojsonName + '.json';
+    } else {
+       return topojsonURL + '/' + topojsonName + '.json';
+    }
 };
  1. I also modified test/jasmine/karma.conf.js as follows because we're getting the JSON files from the local dist directory not the installed sane-topojson package:
-var pathToSaneTopojsonDist = path.join(__dirname, '..', '..', 'node_modules', 'sane-topojson', 'dist');
+var pathToTopojsonDist = path.join(__dirname, '..', '..', 'dist', 'topojson');

// ... and further down, edit to match the new variable name

-        {pattern: pathToSaneTopojsonDist + '/**', included: false, watched: false, served: true}
+        {pattern: pathToTopojsonDist + '/**', included: false, watched: false, served: true}

This still produces error messages like the one shown below saying that the JSON files can't be found:

	Failed: plotly.js could not find topojson file at ../../../dist/topojson/world_110m.json. Make sure the *topojsonURL* plot config option is set properly.
	Error: plotly.js could not find topojson file at ../../../dist/topojson/world_110m.json. Make sure the *topojsonURL* plot config option is set properly.
	    at Object.<anonymous> (/Users/gvwilson/plotly/plotly.js/src/plots/geo/geo.js:137:35 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:147433:31)
	    at Object.event (/Users/gvwilson/plotly/plotly.js/node_modules/@plotly/d3/d3.js:504:42 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:4875:48)
	    at XMLHttpRequest.respond (/Users/gvwilson/plotly/plotly.js/node_modules/@plotly/d3/d3.js:1951:24 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:6326:30)
25 03 2025 10:40:03.170:WARN [web-server]: 404: /dist/topojson/world_110m.json
[note: the message above is then repeated several times]

I've tried variations on the path in test/jasmine/tests/geo_test.js specified by this line:

Plotly.setPlotConfig({ topojsonURL: '../../../dist/topojson' });

Adding an extra .. or removing one of the ones that's there doesn't affect the outcome.

@camdecoster
Copy link
Contributor Author

@gvwilson yes the old maps had 'subunits' layers for a few regions (USA and Brazil did for sure). The UN maps don't include that information. Thanks for looking into the tests. I'm working on an update to get the tests working but I haven't pushed that yet. I'll try to get that out later today.

@ndrezn
Copy link
Member

ndrezn commented Mar 25, 2025

@etpinard when you have a chance could you give this PR a look? Would love your eyes. Thank you!

In the short term what was the source for US states/Canadian provinces in the old dataset?

Refactor to simplify scripts

Switch to UN/NE geodata hybrid

Save UN geodata to archive

Remove extra info from topojson

Add centroids to geojson

Use 'simplify' to create 110m maps
@camdecoster camdecoster marked this pull request as ready for review April 8, 2025 18:28
@emilykl
Copy link
Contributor

emilykl commented Apr 22, 2025

@camdecoster One small suggestion: Either add tasks/topojson/un_geodata_simplified.zip to .gitignore, OR delete the file as part of cleanup from the get_geodata task.

Screen Shot 2025-04-22 at 11 59 42 AM

@camdecoster
Copy link
Contributor Author

@emilykl thanks for pointing that out. I actually meant to ask if that should be saved in the repo. Maybe that's not necessary, but if the files change in the future then the build process could break. Granted, we're not saving the Natural Earth shapefiles, so that is another potential point for things to break. Ultimately the question is, should we save the input files in the repo or download them each time when building the library?

@emilykl
Copy link
Contributor

emilykl commented Apr 22, 2025

@camdecoster It's a good question... curious what @gvwilson @ndrezn think.

Presumably both the UN data and the Natural Earth data will be updated periodically as the world changes (is my assumption correct? or are we referencing a static URL that will never change?) and we will want to pull in the updated data.

On the other hand that's not necessarily a step that we want or need to be doing on EVERY build necessarily.

Are there any steps in the build process that you think are particularly sensitive to changes in the data? How much work would it be to fix the build process if/when the data changes?

@gvwilson
Copy link
Contributor

Decision for now is to download the data afresh each time the build runs; we'll come back in a week or two and modify the build to cache the data in the repo. cc @emilykl @camdecoster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new P1 needed for current cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use UN GeoJSON data for geo charts rather than Natural Earth
4 participants