Describing ML Models with the Geospatial Machine Learning Model Catalog (GMLMC).
During the height of the COVID-19 pandemic, the government of Togo launched a program to “boost national food production in response to the COVID-19 crisis by distributing aid to farmers”1. To accomplish this, the government needed accurate information about the distribution of smallholder farmers throughout the country. This kind of cropland map did not exist for the country, so they worked with NASA Harvest to rapidly develop a cropland map using AI. Finding enough high-resolution labeled training data to train the machine learning model was also a significant challenge, so the team combined global and local crowdsourced labels collected using the Geo-Wiki platform2 with hand-labeled imagery in targeted areas to train a new model for predicting crop areas.
Describing ML Models with the Geospatial Machine Learning Model Catalog (GMLMC).
Now featuring 250+ organizations that focus on machine learning applications with satellite data
The latest interactive Machine Learning for Earth Observation Market Map, a curated list of organizations focused on different machine learning aspects with a satellite data pipeline, is available for download. This release includes an additional list of 100+ organizations, thanks to a crowdsourcing effort on social media. Earlier in September, we asked our followers on Twitter and LinkedIn to identify organizations that we missed in the earlier version of the market map or were established since then. The large number of contributions from people in such a short period speaks of the niche area of machine learning (ML) for Earth observation (EO). The entries hint toward the incredible aptitude of organizations to optimize these innovative technologies and expand them in the service of humanity.
Using STAC to catalog machine learning training data.
Researchers and data scientists are increasingly combining Earth observation (EO) with ground truth data from a variety of sources to build faster, more accurate machine learning (ML) models to gain valuable insights in domains ranging from agriculture to autonomous navigation to ecosystem health monitoring. These models are integrated into analytic pipelines that generate on-the-fly predictions at scale. The accuracy of these inferences are then evaluated using well-defined validation metrics and the results used to improve the performance of the original model in a continuous feedback loop.
If this sounds like a complex process, that’s because it is! Ad-hoc techniques for handling these workflows may work well within a single organization, but can lead to a bewildering array of algorithms and data for end-users.
Submissions to the new Datasets and Benchmarks track require data documentation and availability on an open repository.
Organizers of the NeurIPS 2021 conference recently announced a new track for Datasets and Benchmarks. This is a significant development for a major machine learning (ML) conference to highlight the importance of data in developing algorithms for real-world problems. We at Radiant Earth Foundation welcome this initiative and applaud the organizers for establishing this new track.
In recent years, there have been many discussions and arguments to incentive ML researchers to work on real-world problems. One of those incentive mechanisms is the opportunity to publish a paper in a peer-reviewed conference, and getting recognition for working on these problems. The new track at NeurIPS is a necessary step to realize these incentives.
Generating a global training dataset while supporting social initiatives and sustainable practices.
Labeling satellite imagery is the process of applying tags to scenes to provide context or confirm information. These labeled training datasets form the basis for machine learning (ML) algorithms. The labeling undertaking (in many cases) requires humans to meticulously and manually assign captions to the data, allowing the model to learn patterns and estimate them for other observations.
For a wide range of Earth observation applications, training data labels can be generated by annotating satellite imagery. Images can be classified to identify the entire image as a class (e.g., water body) or for specific objects within the satellite image. However, annotation tasks can only identify features observable in the imagery. For example, with Sentinel-2 imagery at the 10-meter spatial resolution, one cannot detect the more detailed features of interest, such as crop types but would be able to distinguish large croplands from other land cover classes.
A conversation with the First Place winner of the Radiant Earth Tropical Cyclone Wind Estimation Data Competition
We recently announced the Radiant Earth Tropical Cyclone Wind Estimation Data Competition winners, a contest designed to build a machine learning (ML) model to improve NASA IMPACT’s Deep Learning-based Hurricane Intensity Estimator. Seven hundred thirty-three participants leveraged NOAA’s Geostationary Operational Environmental Satellites (GOES) imagery to estimate the wind speeds of storms at different points in time using satellite images captured throughout a storm’s life cycle. In this Q&A, we sat down with Igor Ivanov from Ukraine, winner of the first place Development Seed Award, to talk about his journey to become a data scientist and winning the contest.
Using the Python client to discover and download training datasets without managing API requests.
We are excited to announce the first beta release of the
radiant_mlhub library, a Python client for working with the Radiant MLHub API! With this release, users can work with Radiant MLHub datasets through an intuitive Python interface without having to worry about constructing API requests and managing authentication.
The library is still in the early stages of development, but we encourage you to try it out and give us feedback on how well it addresses your use-cases. This article will walk you through the process of installing and configuring the library, navigating datasets and their collections, and downloading training datasets. For more detailed documentation of the Python library, please see the official documentation here. A basic knowledge of Python programming is recommended.
Meet the rising stars of women around the world at the forefront of machine learning for Earth observation.
Happy International Women’s Day!
Today, we celebrate the women who break barriers and expand the frontiers of machine learning for Earth observation. This essential field can help us understand the planet’s ecosystem, its different elements, interactions, and changes.
These 15 leading women were selected from 56 outstanding nominations from the ML4EO community. The Radiant Earth Foundation selection committee created a set of criteria to rank the nominees.
A little over a year ago, we launched the first iteration of Radiant MLHub in the form of a STAC-compliant API, which allows you to browse our training data collections and list and download individual assets from the items within those collections. Today, we’re announcing the ability to download an archived version of training datasets with just a single-click download. In this post, we’ll describe the process for downloading datasets, the structure of the archived datasets, and provide some tips for effectively traversing the downloaded datasets.
We are now offering three different methods of downloading our datasets. The easiest method, downloading on our registry, can be accessed by navigating to a dataset page and clicking on the “Download” link for each collection you would like to download. Clicking this link will direct you to our dashboard, which will ask you to login if you are not already authenticated and then begin the download process for that collection.
In December 2019, we publicly launched Radiant MLHub, the first open-access cloud-based repository for geospatial training datasets. Since then, we have continuously published new datasets and expanded the ecosystem around Radiant MLHub.
The idea of Radiant MLHub was born in Spring 2018 after several discussions and feedback from members of the community and funders. We had started a new project to develop a global and geographically diverse land cover training dataset using human verification called LandCoverNet. Soon after the launch of LandCoverNet in 2018, we identified a gap in the ecosystem to facilitate publication and uptake of training datasets in our community. That gap in the data value chain led us to the design and implementation of Radiant MLHub.