Innovation Hub

NASA’s Semantically Aware Topic Model Revitalizes Proposal Content Tracking And Taxonomy Development
Bar chart and technology icons


The National Aeronautics and Space Administration (NASA) Small Business Innovation Research (SBIR) proposals are intricate documents, typically spanning around 20 pages in length. These proposals aim to showcase the innovation and relevance of highly ambitious research endeavors undertaken across the agency. As a result, they encompass a minimum of 114 distinct research domains, providing a comprehensive classification system for the high-priority research areas within NASA SBIR’s strategy and development.


REI has created a semantically-aware topic model to monitor individual proposals, their content, and associated research fields that change annually. By employing Sequence Learning Methods (such as LSTM POS Tagger), REI can effectively break down text into its constituent parts of speech. Utilizing a classical Generative Model, Latent Dirichlet Allocation (LDA), the extracted NP and VP tokens are used to determine topic alignments within the proposals. The output from the LDA model is subsequently treated as a document embedding for further analysis and application, such as taxonomy development and similarity detection.


The system can automatically contextualize a proposal within the defined research areas, employing multiple inheritances to track the evolution of research topics across solicitations. As an added advantage, this same system can evaluate proposals for similarity and is currently being utilized to eliminate duplicates during the in-processing stage.

Capabilities Shown

  • Machine Learning
  • Sentiment Analysis
  • Natural Language Processing