Training Data Errors

Quantifying Error in Training Data for Mapping and Monitoring the Earth System Workshop Proceedings

A Workshop on “Quantifying Error in Training Data for Mapping and Monitoring the Earth System” was held on January 8-9, 2019 at Clark University, with support from Omidyar Network’s Property Rights Initiative, now PlaceFund. The goals of the workshop were to:

  1. Summarize the current state of knowledge on the quantification of training data errors and its impacts on machine learning-based methods for generating Earth Observation maps. 
  2. Identify potential sources of error in new training data streams;
  3. Use case studies to quantify how training data errors impact the usability of downstream maps; 
  4. Define best practices for quantifying and reporting i) training data error and ii) its contribution to overall map error. 

The primary workshop outcome will be a peer-review paper (see pre-print here). This site provides additional links to presentations and other resources resulting from the workshop.


NASA ML4EO Workshop Proceedings

Radiant Earth Foundation hosted an international expert workshop to discuss how best to use machine learning (ML) techniques on NASA’s Earth Observation (EO) data and address environmental challenges. In particular, generation and usage of training datasets for ML applications using EO were discussed. Participants of the workshop evaluated recent advancements, identified existing obstacles and developed a best practices guideline to enhance the adoption of these techniques. This workshop was sponsored by the NASA Earth Science Data Systems (ESDS) program.


Analysis Ready Data Interoperability

The ARD and STAC interoperability workshop, which took place on August 13 – 15 at the USGS Menlo Park Campus, was dedicated to discussing interoperability between commercial data sources of imagery and public datasets. Participants presented different approaches to data harmonization with an emphasis on determining standard approaches and practical recommendations on standards for Analytics Ready Data.


Spatio Temporal Asset Catalog

STAC helps to make geospatial assets openly searchable and crawlable.  Essentially, STAC lowers technology friction and allowing one to search, acquire, and analyze imagery across multiple archives, as well as to significantly enhance program outcomes.