You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Title Pitches

  • ?Solving unbalanced or small datasets with synthetic data?
  • ?Identification of complex glyph patterns?

Background

This project would build off of the work done for DARPA and would use their dataset for the research. Identification of other usable datasets should be conducted as well. 

Project Description


Project Goals

Questions the project should answer.

  • What is an acceptable recall?
  • How many synthetic samples are needed to get to that acceptable recall?
  • Can we measure how well a synthetic pattern mimics the original pattern?
  • How does the precision and recall of the model on the train set relate to the generalization back to the original set.
  • Can the full process of creating a synthetic dataset be automated or does it require some human review?

The main goal will to build a synthetic data creation tool that can take a legend label of a glyph and build create a pattern of it that can be used with background templates to create a synthetic training set. The secondary goal will be to measure the performance of various types of models on this problem and try and create a better performing custom model.

  • Research paper describing the tool creating synthetic data
  • Research paper contrasting the performance of various models on this problem and my custom one.
  • Tool to create synthetic data
  • ML Model to train against these glyphs.
  • Small talks on research.

Project Tasks (11 Weeks Estimated, Should budget 4 months to be comfortable?)

  • Pre-Project Research (2-3 Days)
    • Research of prior work on the topic.
    • Research of supporting papers to give a foundation to the work.
  • Programming Tool (4-5 Weeks)
    • Programming the automatic detection of patterns from base image
      • Programming the identification of fixed patterns
      • Programming the identification of the color of patterns
      • Programming the identification of changing elements within patterns (EX. Numerals and Letters).
    • Programming the UI Interface
      • Programming the user adjustment of created patterns (Changing bounding boxes or vertices)
    • Programming the creation of "clean" background data
    • Programming the creation of the synthetic dataset
      • Programming the random rotations of patterns
      • Programming the random changes in numerals
    • Putting the entire thing together in one workflow and creating a non-gut version that can generate synthetic data off of pre made template files or creates good enough accounting that can be downloaded.
    • Documentation of the Tool
  • Model training and testing (3-4 Weeks)
    • Identify and build the models that will be used for testing.

    • Build a script to automate the testing of various models
    • Build a baseline of traditional training on the dataset with no synthetic data
    • Measure the performance of the models on the synthetic dataset
    • Build off of the most promising model and try to improve it.
    • Hopefully few iterations of improvements.
    • Generate final test results for paper 
    • Stretch goal to integrate model training into tool.
  • Writing of Research Products (3 Weeks?) 
    • Identify candidate journals for submission
    • First draft of synthetic data tool paper
    • Small talk /presentation on synthetic data tool
    • Final draft of synthetic data tool
    •  
    • First draft of model research paper
    • Small talk / presentation on model research paper 
    • Final draft of model research paper
  • No labels