Releases: microsoft/torchgeo
v0.6.1
TorchGeo 0.6.1 Release Notes
This is a bugfix release. There are no new features or API changes with respect to the 0.6.0 release.
This release fixes an important security vulnerability and properly documents a lack of support for rasterio 1.4. All users are recommended to update to TorchGeo 0.6.1 if they are using torchgeo.models.get_weight
.
Dependencies
- rasterio: 1.4 not yet supported (#2327)
Datamodules
- Datamodule: use persistent workers for parallel data loading (#2291)
- OSCD: update normalization statistics (#2282)
Datasets
- Datasets: add support for
os.PathLike
(#2273) - GeoDataset: allow a mix of
str
andpathlib
paths (#2270)
Models
- API: avoid use of
eval
inget_weight
(#2323)
Tests
- CD: set up continuous deployment to PyPI (#2342)
- CI: install tensorboard to speed up notebooks (#2315)
- CI: install TorchGeo from checked out repo (#2306)
- dependabot: only update npm lockfile (#2277)
- prettier: ignore cache directories (#2278)
- prettier: prefer single quotes (#2280)
- pytest: set default
--cov
and--cov-report
(#2275) - pytest: set matplotlib backend locally too (#2326)
- pytest: silence numpy 2 warnings in PyTorch (#2302)
- ruff: remove NPY tests now that we test numpy 2 in CI (#2287)
Documentation
- Alternatives: add scikit-eo to list of TorchGeo alternatives (#2340)
- Contributing: installation-agnostic prettier usage (#2279)
- Datasets: move dataset CSV to subdirectory (#2281, #2304)
- Datasets: update NAIP resolution (#2325)
- Tutorials: fix NAIP downloads by signing URL (#2343)
- Tutorials: update recommended strategy for raster datasets containing images and masks (#2293)
Contributors
This release is thanks to the following contributors:
@adamjstewart
@calebrob6
@MathiasBaumgartinger
@Nowosad
@sfalkena
v0.6.0
TorchGeo 0.6.0 Release Notes
TorchGeo 0.6 adds 18 new datasets, 15 new datamodules, and 27 new pre-trained models, encompassing 11 months of hard work by 23 contributors from around the world.
Highlights of this release
Multimodal foundation models
There are thousands of Earth observation satellites orbiting the Earth at any given time. Historically, in order to use one of these satellites in a deep learning pipeline, you would first need to collect millions of manually-labeled images from this sensor in order to train a model. Self-supervised learning enabled label-free pre-training, but still required millions of diverse sensor-specific images, making it difficult to use newly launched or expensive commercial satellites.
TorchGeo 0.6 adds multiple new multimodal foundation models capable of being used with imagery from any satellite/sensor, even ones the model was not explicitly trained on. While GASSL and Scale-MAE only support RGB images, DOFA supports RGB, SAR, MSI, and HSI with any number of spectral bands. It uses a novel wavelength-based encoder to map the spectral wavelength of each band to a known range of wavelengths seen during training.
The following table describes the dynamic spatial (resolution), temporal (time span), and/or spectral (wavelength) support, either via their training data (implicit) or via their model architecture (explicit), offered by each of these models:
Model | Spatial | Temporal | Spectral |
---|---|---|---|
DOFA | implicit | - | explicit |
GASSL | implicit | - | - |
Scale-MAE | explicit | - | - |
TorchGeo 0.6 also adds multiple new unimodal foundation models, including DeCUR and SatlasPretrain.
Source Cooperative migration
TorchGeo contains a number of datasets from the recently defunct Radiant MLHub:
- AgriFieldNet Competition Dataset
- Smallholder Cashew Plantations in Benin
- Sentinel-2 Cloud Cover Segmentation Dataset
- CV4A Kenya Crop Type Competition
- Tropical Cyclone Wind Estimation Competition
- Marine Debris Dataset for Object Detection in Planetscope Imagery
- Rwanda Field Boundary Competition Dataset
- South Africa Crop Type Competition
- SpaceNet Datasets
- Western USA Live Fuel Moisture
These datasets were recently migrated to Source Cooperative (and AWS in the case of SpaceNet), but with a completely different file format and directory structure. It took a lot of effort, but we have finally ported all of these datasets to the new download location and file hierarchy. As an added bonus, the new data loader code is significantly simpler, allowing us to remove 2.5K lines of code in the process!
OSGeo community project
TorchGeo is now officially a member of the OSGeo community! OSGeo is a not-for-profit foundation for open source geospatial software, providing financial, organizational, and legal support. We are in good company, with other OSGeo projects including GDAL, PROJ, GEOS, QGIS, and PostGIS. Membership in OSGeo promotes advertising of TorchGeo to the community, and also ensures that we follow best practices for the stability, health, and interoperability of the open source geospatial ecosystem.
All TorchGeo users are encouraged to join us on Slack, join our Hugging Face organization, and join us in OSGeo using any of the following badges in our README:
Lightning Studios support
TorchGeo has always had a close collaboration with Lightning AI, including active contributions to PyTorch Lightning and TorchMetrics. In this release, we added buttons allowing users to launch our tutorial notebooks in the new Lightning Studios platform. Lightning Studios is a more powerful version of Google Colab, with reproducible software and data environments allowing you to pick up where you left off, VS Code and terminal support, and the ability to quickly scale up to a large number of GPUs. All TorchGeo tutorials have been confirmed to work in both Lightning Studios and Google Colab, allowing users to get started with TorchGeo without having to invest in their own hardware.
Backwards-incompatible changes
- All Radiant MLHub datasets have been ported to the Source Cooperative file hierarchy (#1830)
- GeoDataset: the bbox sample key was renamed to bounds in order to support Kornia (#2199)
- Chesapeake7 and Chesapeake13: datasets were removed when updating to the 2022 edition (#2214)
- Benin Cashews and Rwanda Field Boundary: remove
os.path.expanduser
for consistency (#1705) - LEVIR-CD and OSCD:
images
key was split intoimage1
andimage2
for change detection (#1684, #1696) - EuroSAT:
B08A
was renamed toB8A
to match Sentinel-2 (#1646)
Dependencies
New (optional) dependencies
- aws-cli: to download datasets from AWS (#2203)
- azcopy: to download datasets from Azure (#2064)
- prettier: for YAML file formatting (#2018)
- ruff: for code style and documentation testing (#1994)
Removed (optional) dependencies
- radiant-mlhub: website no longer exists (#1830)
- rarfile: datasets rehosted as zip files (#2210)
- zipfile-deflate: no longer needed for newer Chesapeake data (#2214)
- black: replaced by ruff (#1994)
- flake8: replaced by ruff (#1994)
- isort: replaced by ruff (#1994)
- pydocstyle: replaced by ruff (#1994)
- pyupgrade: replaced by ruff (#1994)
Changes to existing dependencies
- python: 3.10+ required following SPEC 0 (#1966)
- fiona: 1.8.21+ required (#1966)
- kornia: 0.7.3+ required (#1979, #2144)
- lightly: 1.4.5+ required (#2196)
- lightning: 2.3 not supported due to bug (#2155, #2211)
- matplotlib: 3.5+ required (#1966)
- numpy: 1.21.2+ required (#1966), numpy 2 support added (#2151)
- pandas: 1.3.3+ required (#1966)
- pillow: 3.3+ required (#1966), jpeg2000 support required (#2209)
- pyproj: 3.3+ required (#1966)
- rasterio: 1.3+ required (#1966)
- shapely: 1.8+ required (#1966)
- torch: 1.13+ required (#1358)
- torchvision: 0.14+ required (#1358)
- h5py: 3.6+ required (#1966)
- opencv: 4.5.4+ required (#1966)
- pycocotools: 2.0.7+ required (#1966)
- scikit-image: 0.19+ required (#1966)
- scipy: 1.7.2+ required (#1966)
Datamodules
New datamodules
- AgriFieldNet (#1873)
- CaBuAr (#2235)
- ChaBuD (#1259)
- Digital Typhoon (#1748)
- EuroSAT Spatial (#2074)
- GeoNRW (#2209)
- I/O Bench (#1972)
- LEVIR-CD (#1770)
- LEVIR-CD+ (#1707)
- QuakeSet (#1997)
- Sentinel-2 + CDL (#1889)
- Sentinel-2 + EuroCrops (#1869)
- Sentinel-2 + NCCM (#1950)
- Sentinel-2 + South America Soybean (#1959)
- South Africa Crop Type (#1970)
- VHR-10 (#1082)
Changes to existing datamodules
- Remove torchgeo.datamodules.utils.dataset_split (#2005)
- EuroSAT: make sure normalization is actually applied (#2176)
Changes to existing base classes
- Fix plotting in datamodules when dataset is a subset (#2003)
Datasets
New datasets
- AgriFieldNet (#1459)
- Airphen (#1803)
- CaBuAr (#2235)
- ChaBuD (#1259)
- CropHarvest (#1677)
- Digital Typhoon (#1748)
- EuroCrops (#1813)
- EuroSAT Spatial (#2074)
- GeoNRW (#2209)
- I/O Bench (#1972)
- LEVIR-CD (#1770)
- Northeast China Crop Map (#1666)
- PRISMA (#1743)
- QuakeSet (#1997)
- SkyScript (#2253)
- South Africa Crop Type (#1840)
- South America Soybean (#1668)
- SpaceNet 8 (#2203)
Changes to existing datasets
- Benin Cashews: migrate to Source Cooperative (#2116)
- Benin Cashews: remove
os.path.expanduser
for consistency (#1705) - BigEarthNet: fix broken download link (#2174)
- CDL: add 2023 checksum (#1844)
- Chesapeake: update to 2022 edition (#2214)
- ChesapeakeCVPR: reuse NLCD colormap (#1690)
- Cloud Cover: migrate to Source Cooperative (#2117)
- CV4A Kenya Crop Type: migrate to Source Cooperative (#2090)
- EuroSAT: rename
B08A
toB8A
to match Sentinel-2 (#1646) - FireRisk: redistribute on Hugging Face (#2000)
- GlobBiomass: add min/max timestamp ...
v0.5.2
TorchGeo 0.5.2 Release Notes
This is a bugfix release. There are no new features or API changes with respect to the 0.5.1 release.
This release contains a number of important fixes to reproducibility and determinism. All users are recommended to upgrade to 0.5.2 if they want to ensure the reproducibility of their work.
TorchGeo has always supported Python 3.12, but this is now officially tested!
Dependencies
- Test TorchGeo support for Python 3.12 (#1837)
- lightly 1.4.26 is incompatible with smp (#1824, #1825)
- Add dev container to support Github Codespaces development (#1085)
Datamodules
- L7 Irish previously used a nondeterministic train/val/test split. This is now fixed (#1899, #1908)
- L8 Biome previously used a nondeterministic train/val/test split. This is now fixed (#1899, #1908)
- Tropical Cyclone previously used a nondeterministic train/val/test split. This is now fixed (#1839)
- SEN12MS previously used a nondeterministic train/val/test split. This is now fixed (#1839)
Datasets
- RasterDataset: clarify documentation of is_image and dtype (#1811)
- GeoDataset previously used a nondeterministic train/val/test split. This is now fixed (#1899, #1908)
- xView2 previously used a nondeterministic order. This is now fixed (#1918)
- HuggingFace: use stable download URLs (#1916)
- GitLab: use stable download URLs (#1917)
- Deep Globe Land Cover: document download steps (#1797, #1921)
- PASTIS: fix default folds (#1810)
- SustainBench Crop Yield: fix download support (#1753, #1755)
- SustainBench Crop Yield: eager data loading (#1754, #1756)
Models
Samplers
- RandomGeoSampler: optional length is optional (#1907)
Trainers
- Remove unnecessary argmax before call to torchmetrics (#1777)
- Better document default trainer metrics (#1874, #1914, #1923, #1924)
- ObjectDetectionTask: increase test coverage (#1739)
Scripts
- SSL4EO download: skip downloading missing coordinates (#1821)
- Ensure that all files have the license header at the top (#1787)
Tests
- Notebooks: use stable dependency versions (#1838)
- Don't cast warnings to errors (#1793)
- Fix lightning-utilities deprecation warning (#1733)
- Fix pre-commit dependency versions (#1781)
Documentation
- RasterDataset: clarify documentation of is_image and dtype (#1811)
- RtD: use stable dependency versions (#1827)
- Document TorchGeo alternatives (#1742)
- Tutorials: load_state_dict does not return the model (#1503)
- README: fix VHR-10 example (#1686, #1920)
- README: add TorchGeo podcast episodes (#1806)
- README: add PyTorch badge (#1882)
- README: add OSGeo badge (#1880)
- README: add color lexing of bibtex (#1820)
- README: fix Spack link (#1804)
Contributors
This release is thanks to the following contributors:
@adamjstewart
@ashnair1
@calebrob6
@DimitrisMantas
@dmeaux
@isaaccorley
@jdilger
@julien-blanchon
@konstantinklemmer
@nilsleh
@tatsubori
v0.5.1
TorchGeo 0.5.1 Release Notes
This is a bugfix release. There are no new features or API changes with respect to the 0.5.0 release.
Datamodules
Datasets
- AGB Live Woody Biomass: update download link for dataset (#1679, #1713)
- EuroSAT: remove classes attribute and instead rely on
ImageFolder
classes (#1648, #1650) - OSCD: change image datatype be float instead of int (#1652, #1656)
- RESICS45: remove classes attribute and instead rely on
ImageFolder
classes (#1648, #1650) - UC Merced: fix plotting which expects images from dataset to be normalized already (#1712)
- UC Merced: remove classes attribute and instead rely on
ImageFolder
classes (#1648, #1650) - GeoDataset: check if the path points to a Virtual File System, to prevent error of looking and not finding the paths locally (#1605, #1612)
- GeoDatasets: consistent use of
paths
argument instead ofroot
inRuntimeError
of several datasets (#1704, #1717)
Trainers
- During logging, trainers were expecting a datamodule with plot functionality, which was preventing trainers from being used with custom Pytorch Dataloaders (#1703)
- Remove default callback configurations of trainers and leave it to user instead (#1640, #1641, #1642, #1645, #1647)
- Skip weights and augmentations when saving hparams, allowing these parameters to be changed (#1622, #1639, #1670)
Scripts
Tests
- Greatly reduce memory footprint of CI which was causing PR tests to fail (#1658)
- Copy testing csv file instead of downloading it for MapInWild dataset test (#1657)
- Fix
choco install unrar
in CI by using7zip
instead ofunrar
(#1697) - CI: use unique names for release caches (#1723)
Documentation
- README: update
SemanticSegmentationTask
example with arguments introduced in 0.5 (#1608) - README: add section on LightningCLI usage with torchgeo (#1626, #1628)
- README: add section on availability of pretrained weights in torchgeo (#1716)
- BioMassters: fix typo in docs' overview table of non-geo datasets (#1718)
- SSL4EO-L Benchmark: add dataset information to documentation (#1719)
Contributors
This release is thanks to the following contributors (in alphabetical order):
@adamjstewart
@ashnair1
@dylanrstewart
@kaybe20
@menglutao
@nilsleh
@pioneerHitesh
@robmarkcole
v0.5.0
TorchGeo 0.5.0 Release Notes
0.5.0 encompasses over 8 months of hard work and new features contributed by 20 users from around the world. Below, we detail specific features worth highlighting.
Highlights of this release
New command-line interface
TorchGeo has always had tight integration with PyTorch Lightning, including datamodules for common benchmark datasets and trainers for most computer vision tasks. TorchGeo 0.5.0 introduces a new command-line interface for model training based on LightningCLI. It can be invoked in two ways:
# If torchgeo has been installed
torchgeo
# If torchgeo has been installed, or if it has been cloned to the current directory
python3 -m torchgeo
It supports command-line configuration or YAML/JSON config files. Valid options can be found from the help messages:
# See valid stages
torchgeo --help
# See valid trainer options
torchgeo fit --help
# See valid model options
torchgeo fit --model.help ClassificationTask
# See valid data options
torchgeo fit --data.help EuroSAT100DataModule
Using the following config file:
trainer:
max_epochs: 20
model:
class_path: ClassificationTask
init_args:
model: "resnet18"
in_channels: 13
num_classes: 10
data:
class_path: EuroSAT100DataModule
init_args:
batch_size: 8
dict_kwargs:
download: true
we can see the script in action:
# Train and validate a model
torchgeo fit --config config.yaml
# Validate-only
torchgeo validate --config config.yaml
# Calculate and report test accuracy
torchgeo test --config config.yaml
It can also be imported and used in a Python script if you need to extend it to add new features:
from torchgeo.main import main
main(["fit", "--config", "config.yaml"])
See the Lightning documentation for more details.
Self-supervised learning and Landsat
Self-supervised learning has become a dominant technique for model pre-training, especially in domains (like remote sensing) that are rich in data but lacking in large labeled datasets. The 0.5.0 release adds powerful trainers for the following SSL techniques:
large unlabeled datasets for multiple satellite platforms:
and the first ever models pre-trained on Landsat imagery. See our SSL4EO-L paper for more details.
Utilities for splitting GeoDatasets
In prior releases, the only way to create train/val/test splits of GeoDatasets was to use a Sampler roi
. This limited the types of splits you could perform, and was unintuitive for users coming from PyTorch where the dataset can be split into multiple datasets. TorchGeo 0.5.0 introduces new splitting utilities for GeoDatasets in torchgeo.datasets
, including:
random_bbox_assignment
: randomly assigns each scene to a different splitrandom_bbox_splitting
: randomly split each scene and assign each half to a different splitrandom_grid_cell_assignment
: overlay a grid and randomly assign each grid cell to a different splitroi_split
: split using aroi
just like with Samplertime_series_split
: split along the time axis
Splitting with a Sampler roi
is not yet deprecated, but users are encouraged to adopt the new dataset splitting utility functions.
GeoDatasets now accept lists as input
Previously, each GeoDataset accepted a single root directory as input. Now, users can pass one or more directories, or a list of files they want to include. At first glance, this doesn't seem like a big deal, but it actually opens a lot of possibilities for how users can construct GeoDatasets. For example, users can use custom filters:
files = []
for file in glob.glob("*.tif"):
# check pixel QA band or metadata file
if cloud_cover < 20: # select images with minimal cloud cover
files.append(file)
ds = Landsat8(files)
or use remote files from S3 buckets or Azure blob storage. Basically, as long as GDAL knows how to read the file, TorchGeo supports it, wherever the file lives.
Note that some datasets may not support a list of files if you also want to automatically download the dataset because we need to know the directory to download to.
Building a community
With over 50 contributors from around the world, we needed a better way to discuss ideas and share announcements. TorchGeo now has a public Slack channel! Join us and say hello 👋
Now that the majority of the features we've needed have been implemented, one of our goals for the next release is to improve our documentation and tutorials. Expect to see TorchGeo tutorials at all the popular ML/RS conferences next year! We're excited to meet our users in person and learn more about their unique use cases and needs.
Backwards-incompatible changes
- GeoDataset: first parameter renamed from
root
topaths
(#1442, #1597) - Trainers: many parameters renamed (#1541)
- FAIR1M datamodule:
*_split_pct
parameters removed (#1275) - Inria datamodule:
*_split_pct
parameters removed (#1540) - SemanticSegmentationTask: changes to
weights
parameter (#1046)
Dependencies
- Drop Python 3.7 and 3.8 support following NEP 29 (#1058, #1246)
- Dependencies now listed in
pyproject.toml
(#1446) - Drop upper bounds on dependencies (#1480)
- Lightly: new required dependency (#1252, #1285)
- Lightning: extra dependencies now required (#1559)
- Omegaconf: no longer a dependency (#1559)
- Pandas: now supports v2.1 (#1537)
- Pandas: new required dependency (#1586)
- Scikit-Learn: no longer a dependency (#1063)
- TorchMetrics: now supports v1 (#1465)
Datamodules
New datamodules:
- EuroSAT 100 (#1130)
- FireRisk (#1265)
- L7 Irish (#1197)
- L8 Biome (#1200)
- SeCo (#1168)
- SKIPP'D (#1267)
- SSL4EO-L (#1332)
- SSL4EO-L Benchmark (#1338)
- SSL4EO-S12 (#1151)
- SustainBench (#1253)
Changes to existing datamodules:
- FAIR1M: add val/test splits, drop split parameters (#1275)
- Inria: add val split, drop split parameters (#654, #1540)
- RESISC45: better normalization (#1349)
- So2Sat: support RGB-only mode (#1283)
- So2Sat: control size of validation dataset (#1283)
New base classes:
- BaseDataModule (#1260)
Changes to existing base classes:
- GeoDataModule: automatically infer epoch length (#1257)
- BaseDataModule: better error messages (#1307, #1441)
Datasets
New datasets:
- BioMassters (#1560)
- EuroSAT 100 (#1130)
- FireRisk (#1265)
- L7 Irish (#1197)
- L8 Biome (#1200)
- LandCover.ai Geo (#1126)
- MapInWild (#1096, #1131)
- NLCD (#1244)
- PASTIS (#315)
- Rwanda Field Boundary (#1574)
- SeasoNet (#1466)
- SKIPP'D (#1267, #1548)
- SSL4EO-L (#1332, #1424)
- SSL4EO-L Benchmark (#1338, #1431)
- SSL4EO-S12 (#1151)
- SustainBench (#1253)
- Western USA Live Fuel Moisture (#1262)
Changes to existing datasets:
- CDL: add years parameter (#1337)
- CDL: add classes parameter (#1392)
- CDL: map class labels to ordinal numbers (#1364, #1368)
- CDL: return figure (#1369)
- CMS Mangrove Canopy: return figure (#1369)
- DFC2022: avoid interpolation in colormap (#1372)
- FAIR1M: add val/test splits (#1275)
- FAIR1M: add download support (#1275)
- Inria: add validation split (#654, #1540)
- SeCo: add seasons parameter (#1168)
- SeCo: faster initialization (#1168)
- SeCo: support new directory structure (#1235)
- So2Sat: add version 3 (#1086, #1283)
- UCMerced: fix image shape bug (#1238)
- USAVars: return lat/lon of centroid (#1240)
- USAVars: convert image to float32 (#1433)
- USAVars: download from Hugging Face (#1453)
Changes to existing base classes:
- GeoDataset: accept list of files or directories (#1427, #1442, #1597)
- GeoDataset: add files property (#1442, #1597)
- Intersection/UnionDataset: fix crs/res propagation (#1341, #1344)
- RasterDataset: add dtype attribute (#1149)
- RasterDataset: allow sampling outside bounds of image (#1329, #1344)
New utility functions:
Models
Changes to existing models:
- RCF: add empirical sampling mode (#1339)
New pre-trained model weights:
Changes to existing pre-trained model weights:
Samplers
Changes to existing samplers:
Trainers
New trainers:
Changes to existing trainers:
- Add ability to freeze backbones and decoders (#1290)
- Fix support for datasets without a plot method (#1551, #1585)
- BYOL: add random season contrast (#1168)
- Classification: add class weights for cross entropy loss (#1592)
- Semantic Segmentation: add class weights for cross entropy loss (#1221)
- Semantic Segmentation: add ...
v0.4.1
TorchGeo 0.4.1 Release Notes
This is a bugfix release. There are no new features or API changes with respect to the 0.4.0 release.
Dependencies
Some dependencies have changed:
- nbmake: 1.3.3+ required now (#1124)
- omegaconf: now optional (#1214)
- pytorch-lightning: replaced with lightning (#1178, #1179)
- sphinx: 6+ not yet supported (#1144)
- tensorboard: now optional (#1214)
pip install torchgeo[all]
added, installs all optional dependencies (#1095)
Other dependencies now support newer versions:
- black: add 23 support (#1080)
- kornia: add 0.6.10 support (#1123)
- mypy: add 1 support (#1089)
- nbsphinx: add 0.9 support (#1173)
- pandas: add 2 support (#1216)
- pyvista: add 0.38 support (#1083)
- radiant-mlhub: add 0.5 support (#1102)
- scikit-image: add 0.20 support (#1153)
- setuptools: add 67 support (#1066)
- torch: add 2 support (#1177)
- torchvision: add 0.15 support (#1177)
Datamodules
- SeCo: fix transforms (#1166)
Datasets
Fixes for benchmark datasets:
- BigEarthNet: fix order of class labels (#1127)
- CDL: add checksum for 2022 mask (#1201)
- EuroSAT: fix SSL issue, redistribute on Hugging Face (#1065, #1072)
- FAIR1M: fix directory name (#1098, #1099)
- Landsat: better default bands (#1169)
- UC Merced: redistribute on Hugging Face (#1076)
- USAVars: fix class labels (#1138)
Fixes for base classes:
- RasterDataset: fix support for datasets where
all_bands
does not actually contain all bands (e.g., Landsat) (#1134, #1135) - RasterDataset: fix support for datasets where
all_bands
is not defined andseparate_files
is False (#1135) - RasterDataset: fix bug when
separate_files
and no date infilename_regex
(#1191) - RasterDataset: remove unnecessary glob (#1219)
- RasterDataset: better error message when no data found (#1193)
- IntersectionDataset: better error message when no overlap (#1192)
Models
There are several improvements to our new pre-trained weights:
Trainers
- BYOL: Fix image size to match ViT patch size (#1084)
- Fix support for loading ViT weights (#1049, #1084)
- Fix support for non-TensorBoardLogger (#1143, #1145)
Tests
A lot of work in this patch release went towards improving CI:
- Constrain dependencies to avoid CI hang (#1062)
- Codecov: use repository upload token (#1077)
- Cache pip installs (#1057)
- Cancel in-progress jobs on new commit (#1094) but not the labeler tasks (#1187)
- Test notebooks when they are modified (#1097)
- Speed up object detection tests (#1148)
- Fix tests on macOS arm64 (MPS support) (#1188)
- Properly test pre-trained model transforms (#1166)
- Speed up notebook tests (#665, #1124)
Documentation
- Update the example embedded in the README (#1211)
- Fix broken URLs throughout the documentation (#1125)
- Tutorial downloads are now much smaller and faster (#1124)
- Replace CSV with TensorBoard in Trainer tutorial (#1163, #1189)
- Fix version selection button (#1144)
Contributors
This release is thanks to the following contributors:
@adamjstewart
@ashnair1
@bugraaldal
@calebrob6
@isaaccorley
@julien-blanchon
@lucastao
@nilsleh
@SpontaneousDuck
@TolgaAktas
v0.4.0
TorchGeo 0.4.0 Release Notes
This is our biggest release yet, with improved support for pre-trained models, faster datamodules and transforms, and more powerful trainers. See the following sections for specific changes to each module:
- Backwards-incompatible changes
- Dependencies
- Datamodules
- Datasets
- Models
- Samplers
- Trainers
- Transforms
- Documentation
As always, thanks to our many contributors!
Backwards-incompatible changes
- Datasets: So2Sat bands were renamed (#735)
- Datasets: TropicalCycloneWindEstimation was renamed to TropicalCyclone (#815, #846)
- Datasets: VisionDataset and VisionClassificationDataset (deprecated in 0.3) have been removed (#627)
- Datamodules: many arguments have been renamed or reordered (#666, #730, #992)
- Datamodules: CycloneDataModule was renamed to TropicalCycloneDataModule (#815, #846)
- Models: resnet50 has a new multi-weight API (#917)
- Trainers: many arguments have been renamed (#916, #917, #918, #919, #920)
- Transforms: now take a single image as input instead of a sample dict (#999)
Dependencies
- Open3D replaced by PyVista (#663)
- Remove packaging dependency (#1019)
- Support einops 0.6 (#896)
- Support flake8 6 (#910)
- Support mypy 0.991 (#900)
- Support pytest-cov 4 (#801)
- Support pyupgrade 3 (#817)
- Support setuptools 66 (#1017)
- Support shapely 2 (#949)
- Support sphinx 6 (#990)
- Support timm 0.6 (#1002)
- Support torchmetrics 0.11 (#925)
- Support torchvision 0.14 (#875)
Datamodules
Our existing datamodules worked well, but suffered from several performance issues. For the average dataset with 3 splits (train/val/test), we were instantiating the dataset 10 times! All data augmentation was done on the CPU, one sample at a time. A multiprocessing bug prevented parallel data loading on macOS and Windows. And a serious bug was discovered in some of our datamodules that allowed training images to leak into the test set (only affected datamodules using torchgeo.datamodules.utils.dataset_split
). All of these bugs have been fixed, and performance has been drastically improved. Datasets are only instantiated 3 times (once for each split). All data augmentation happens on the GPU, an entire batch at a time. And multiprocessing is now supported on all platforms. By refactoring our datamodules and adding new base classes, we were able to remove 1.6K lines of duplicated code in the process!
New datamodules:
Changes to existing datamodules:
- Only instantiate dataset in prepare_data if download is requested (#967, #974)
- Only instantiate datasets needed for a given stage (#992)
- Use Kornia for all data augmentation (#992)
- Faster data augmentation (CPU → GPU, sample → batch) (#992)
- Fix macOS/Windows multiprocessing bug (#886, #992)
- Fix bug with train images leaking into test set (#992)
- Add plot method to all datamodules (#814, #992)
torchgeo.datamodules.utils.dataset_split
is deprecated, usetorch.utils.data.random_split
instead (#992)- Pass kwargs directly to datasets (#666, #730)
- Add random cropping to several datamodules (#851, #853, #855, #876, #929)
- Inria Aerial Image Labeling: fix predict dimensions (#975)
- LandCover.ai: fix mIoU calculation and plotting (#959)
- Tropical Cyclone: CycloneDataModule was renamed to TropicalCycloneDataModule (#815, #846)
New base classes:
- Add GeoDataModule and NonGeoDataModule base classes (#992)
Datasets
This release adds a new Sentinel-1 dataset. Here is a scene taken over the Big Island of Hawai'i:
Additionally, all image datasets now have a plot
method.
New datasets:
Changes to existing datasets:
- Add default root argument to all datasets (#802)
- Consistent capitalization of band names (#778)
- Many datasets now return float images and int labels (#992)
- Chesapeake CVPR: add plot method (#820)
- ETCI 2021: fix data loading (#861)
- NASA Marine Debris: fix plot warning when model outputs no prediction boxes (#988)
- OSCD: images are now stacked channel-wise (#992)
- SEN12MS: mask is only single channel (#992)
- Sentinel-2: use 10,000 as scale factor (#1027)
- So2Sat: rename bands (#735)
- Tropical Cyclone: renamed from TropicalCycloneWindEstimation to TropicalCyclone (#815, #846)
- Tropical Cyclone: images are RGB, not grayscale (#992)
- VHR-10: add plot method (#847)
- xView2: remove labels folder (#787)
Changes to existing base classes:
- RasterDataset supports band indexing now (#687)
- UnionDataset actually works now (#769, #786)
- UnionDataset and IntersectionDataset support transforms (#867, #870)
- VectorDataset supports multi-label datasets (#862)
Models
Due to the nature of satellite imagery (different number of spectral bands for every satellite), it is impossible to have a single set of pre-trained weights for each model. TorchGeo has always had multi-weight support:
model = resnet50(sensor="sentinel2", bands="all", pretrained=True)
However, this is difficult to extend if you want more fine-grained control over model weights. More recently, torchvision introduced a new multi-weight support API:
- Introducing TorchVision's New Multi-Weight Support API
- Easily List and Initialize Models With New APIs in TorchVision
With the 0.4.0 release, TorchGeo has now adopted the same API:
model = resnet50(weights=ResNet50_Weights.SENTINEL2_ALL_MOCO)
We also support PyTorch Hub now:
>>> import torch
>>> from torchgeo.models import ResNet18_Weights
>>> torch.hub.list("microsoft/torchgeo", trust_repo=True)
Downloading: "https://github.com/microsoft/torchgeo/zipball/models/weights" to ~/.cache/torch/hub/models_weights.zip
['resnet18', 'resnet50', 'vit_small_patch16_224']
>>> model = torch.hub.load("microsoft/torchgeo", "resnet18")
Using cache found in ~/.cache/torch/hub/microsoft_torchgeo_models_weights
>>> model = torch.hub.load("microsoft/torchgeo", "resnet18", weights=ResNet18_Weights.SENTINEL2_RGB_MOCO)
Using cache found in ~/.cache/torch/hub/microsoft_torchgeo_models_weights
In our previous release, we had 1 model pre-trained on 1 satellite with 1 training procedure. We now have 3 models (ResNet-18, ResNet-50, ViT) trained on both Sentinel-1 and Sentinel-2 for all bands and RGB-only bands with 3 SSL techniques (MoCo, DINO, SeCo), and plans to expand this in the future. Shoutout to Zhu Lab and ServiceNow for publishing these weights!
New models:
- Add ResNet-18 and ViT models (#917)
Changes to existing models:
New utility functions:
- Functions to list, query, and initialize models and weights (#917)
Samplers
Changes to existing samplers:
- All random samplers now have a default value for length (#755)
New utility functions:
- get_random_bounding_box and tile_to_chips are now public functions (#755)
Trainers
This release introduces a new trainer for object detection, one of our most highly requested features. All trainers now support prediction. Our old trainers only supported ResNet backbones. Our new trainers now support the 600+ backbones provided by the timm library. And all of the new pre-trained models mentioned above are now supported by our trainers as well.
New trainers:
- Object Detection: add trainer, add Faster R-CNN (#442, #758)
- Object Detection: add RetinaNet and FCOS (#984)
Changes to existing trainers:
- Add support for all timm backbones (#854, #918)
- Add support for more pretrained models (#917)
- Change model argument names (#916, #918, #919, #920)
- Support prediction (#790, #792, #813, #818, #819, #939)
- Fix plotting file handle leak (#825, #826)
- Multi-label Classification: replace softmax with sigmoid (#791)
Transforms
Whenever possible, we try to avoid reinventing the wheel. For data augmentation transforms that aren't specific to geospatial data or satellite imagery, we use existing implementations in popular libraries like:
- torchvision (PIL and PyTorch backends)
- albumentations (OpenCV backend)
- kornia (PyTorch backend)
Until now, we've been fairly agnostic towards data augmentation libraries. However, neither PIL nor OpenCV support multispectral imagery. Because of this, we've decided to use Kornia for all transforms.
Changes to existing transforms:
v0.3.1
TorchGeo 0.3.1 Release Notes
This is a bugfix release. There are no new features or API changes with respect to the 0.3.0 release.
Dependencies
- pytorch-lightning: add 1.9 support (#697, #771)
- radiant-mlhub: 0.5 not yet supported (#711)
- segmentation-models-pytorch: add 0.3 support (#692)
- setuptools: add 65 support (#715, #753)
- torchvision: fix 0.12 pretrained model support (#761)
DataModules
Datasets
- Fix rounding bugs leading to inconsistent image shapes in vector datasets (#674, #675, #679, #736)
- IDTReeS: fix (x, y) coordinate swap in boxes (#683, #684)
- IDTReeS: clip boxes to bounds of image (#684, #760)
- Sentinel-2: add support for files downloaded from USGS EarthExplorer (#505, #754)
- Sentinel-2: prevent dataset from loading bands at different resolutions (#754)
- Sentinel-2: support loading even when band B02 is not present (#754)
Samplers
Transforms
Documentation
API docs:
- USAVars is a regression dataset (#699)
Tutorials:
- Use IntersectionDataset in sampler (#707)
- Custom Raster Datasets: complete overhaul with real data (#766, #772)
- Trainers: optional datasets required (#759)
- Transforms: replace cell magic with shell command (#756)
- Transforms: fix GPU usage (#763, #767)
- Clean up file names, execution counts, and output (#770)
Contributors
This release is thanks to the following contributors:
v0.3.0
TorchGeo 0.3.0 Release Notes
This release contains a number of new features, and brings increased stability to installations and testing.
In previous releases, not all dependencies had a minimum supported version listed, causing issues if users had old versions lying around. Old releases would also install the latest version of all dependencies even if they had never been tested before. TorchGeo now lists a minimum and maximum supported version for all dependencies. Moreover, we now test the minimum supported versions of all dependencies. Dependencies are automatically updated using dependabot to prevent unrelated CI failures from sneaking into PRs. We hope this makes it even easier to contribute to TorchGeo, and ensures that old releases will continue to work even if our dependencies make backwards-incompatible changes.
Backwards-incompatible changes
- VisionDataset and VisionClassificationDataset have been renamed to NonGeoDataset and NonGeoClassificationDataset (#627)
- Sample size now defaults to pixel units, use
units=Units.CRS
for old behavior (#294) - RasterDataset no longer has a plot method, subclasses have their own plot methods (#476)
- Plot method of RasterDataset subclasses now take sample dicts, not image tensors (#476)
- Removed FCEF model, use segmentation_models_pytorch.Unet instead (#345)
- SemanticSegmentationTrainer: ignore_zeros renamed to ignore_index (#444, #644)
Dependencies
- Python 3.7+ is now required (#413, #482, #486)
- Add lower version bounds to all dependencies based on testing (#574)
- Add upper version bounds to all dependencies based on semver (#544, #557)
- Fix Conda environment installation (#527, #528, #529, #545)
Datamodules
New datamodules:
Changes to existing datamodules:
- Improved consistency between datamodules (#657)
Datasets
New datasets:
- Aboveground Live Woody Biomass Density (#425)
- Aster GDEM (#404)
- CMS Global Mangrove Canopy (#391, #427)
- DeepGlobe (#578)
- DFC 2022 (#354)
- EDDMapS (#533)
- EnviroAtlas (#364)
- Esri 2020 Land Cover (#390, #405)
- EU-DEM (#426)
- Forest Damage (#461, #499)
- GBIF (#507)
- GlobBiomass (#395)
- iNaturalist (#532)
- Inria Aerial Image Labeling (#355)
- Million-AID (#455)
- OpenBuildings (#68, #402)
- ReforesTree (#582)
- SpaceNet 3 (#480)
- USAVars (#363)
Changes to existing datasets:
- Benin Small Holder Cashews: return geospatial metadata (#377)
- BigEarthNet: fix checksum (#550)
- CBF: add plot method (#410)
- CDL: add 2021 download (#418)
- CDL: add plot method (#415)
- Chesapeake: add plot method (#417)
- EuroSat: new bands parameter (#396, #397)
- LandCover.ai: update download URL (#559, #579)
- Landsat: add support for all Level-1 and Level-2 products (#492, #504)
- Landsat: add plot method (#661)
- NAIP: add plot method (#407)
- Seasonal Contrast: ensure that all images are square (#658)
- Sentinel: add plot method (#416, #493)
- SEN12MS: avoid casting float to int (#500, #502)
- So2Sat: new bands parameter (#394)
Base classes and utilities:
- VisionDataset and VisionClassificationDataset have been renamed to NonGeoDataset and NonGeoClassificationDataset (#627)
- RasterDataset no longer has a plot method, subclasses have their own plot methods (#476)
- Plot method of RasterDataset subclasses now take sample dicts, not image tensors (#476)
- BoundingBox has new area and volume attributes (#375)
- Don't subtract microsecond from mint (#506)
Models
Changes to existing models:
- Removed FCEF model, use segmentation_models_pytorch.Unet instead (#345)
- FCSiamConf and FCSiamDiff now inherit from segmentation_models_pytorch.Unet, allowing for easily loading pretrained weights (#345)
Samplers
New samplers:
- PreChippedGeoSampler (#479)
Changes to existing samplers:
- Allow for point sampling (#477)
- Allow for sampling of entire scene (#477)
- RandomGeoSampler no longer suffers from area bias (#408, #477)
- Sample size now defaults to pixel units, use
units=Units.CRS
for old behavior (#294)
Trainers
Changes to existing trainers:
- BYOLTask: fix in_channels handling (#522)
- BYOLTask: fix loading of encoder weights (#524)
- SemanticSegmentationTask: ignore_zeros renamed to ignore_index (#444, #644)
Transforms
New spectral indices:
New base classes:
- AppendTriBandNormalizedDifferenceIndex (#414)
Documentation
- Improved README (#589, #626)
- Add dataset tables (#435, #478, #649)
- Shorter dataset/datamodule/model names (#569, #571)
- Spectral indices now display mathematical equations (#400)
- Fix NAIP download in tutorials (#526, #531)
- Add issue templates on GitHub (#584, #590)
- Clarify Windows conda installation (#581)
- Public type hints (#508)
Tests
- Test on Python 3.10 (#457)
- Use dependabot to manage dependencies (#488, #551, #647)
- Test minimum version of dependencies (#574)
- Resolve and test for deprecation warnings (#567)
- FCSiam tests no longer require internet access (#495, #497)
Contributors
This release is thanks to the following contributors:
v0.2.1
TorchGeo 0.2.1 Release Notes
This is a bugfix release. There are no new features or API changes with respect to the 0.2.0 release.
Dependencies
- Fix minimum supported kornia version (#350)
- Support older pytorch-lightning (#347, #351)
- Add support for torchmetrics 0.8+ (#361, #382)
DataModules
- RESISC45: fix normalization statistics (#440)
Datasets
Fixes for dataset base classes:
- GeoDataset: fix
len()
of empty dataset (#374) - RasterDataset: add support for float dtype (#379, #384)
- RasterDataset: don't override custom cmap (#421, #422)
- VectorDataset: fix issue with empty query (#399, #454, #467)
Fixes for specific datasets:
- CDL: update checksums due to new file formats (#423, #424, #428)
- Chesapeake: support extraction of deflate64-compressed zip files (#59, #282)
- Chesapeake: allow multiple datasets to share same root (#419, #420)
- ChesapeakeCVPR: update prior extension data to version 1.1 (#359)
- IDTReeS: fix citation (#389)
- LandCover.ai: support already-downloaded dataset (#383)
- Sentinel-2: fix regex to support band 8A (#393)
- SpaceNet 2: update checksum due to data format consistency fix (#469)
Samplers
Tutorials
- Fix variable name in trainer notebook (#434)
Tests
Contributors
This release is thanks to the following contributors: