Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The main goal is to build a synthetic data creation tool that can take the glyph of a legend label and create a pattern of it that can be used with background templates to create a synthetic training set. The secondary goal will be to measure the performance of various types of models on this problem and try and create a better performing custom model.

MilestoneQuality MetricDate
Show that a synthetic training set can be used to train the base model that is equal to or better then the original training set.F-score of model trained on synthetic data is >= the F-score from training on the original dataset3-4 weeks after start
Show proof that the initial concept holds weight. The scores from the competition were mostly in the <10% range, with the highest being above 30%.

F-score is >50%

2 months after start


F-score is >95%

Project Conclusion

Project Products

  • Research paper describing the tool creating synthetic data
  • Research paper contrasting the performance of various models on this problem and my custom one.
  • Tool to create synthetic data
  • ML Model to train against these glyphs.
  • Small talks on research.

...

The primary benefit for NFI in this project is gaining the skills and experience and tools needed to tackle harder problems with synthetic training imagery. Both the NGA and DOD have shown they have an interest in synthetic satellite data, giving out development grants to Orbital Insights and L3Harris respectively. Working with satellite data presents a more difficult challenge than the DARPA Project. If we want to take on these more difficult challenges having a foundation of knowledge and tools that we can adapt would be greatly beneficial.

Synthetic training data provides a way to quickly generate new ai models for problems that there may not be enough data to tackle traditionally. Even if the NGA and DOD do not end up being partners, understanding synthetic data workflows and problems would increase our value as ML consultants for other projects, as I it is my view that using synthetic training data will become more common in the future. 

...