Jump to content

NASA-IBM Collaboration Develops INDUS Large Language Models for Advanced Science Research


Recommended Posts

  • Publishers
Posted

4 min read

NASA-IBM Collaboration Develops INDUS Large Language Models for Advanced Science Research

Five orange stars connected in a V-like shape with blue lines, like a diagram of the constellation of Indus. Each of the stars is labeled with one of the NASA Science Mission Directorate divisions: astrophysics, Earth science, heliophysics, planetary science, and biological and physical sciences.
Named for the southern sky constellation, INDUS (stylized in all caps) is a comprehensive suite of large language models supporting five science domains.
NASA

By Derek Koehl

Collaborations with private, non-federal partners through Space Act Agreements are a key component in the work done by NASA’s Interagency Implementation and Advanced Concepts Team (IMPACT). A collaboration with International Business Machines (IBM) has produced INDUS, a comprehensive suite of large language models (LLMs) tailored for the domains of Earth science, biological and physical sciences, heliophysics, planetary sciences, and astrophysics and trained using curated scientific corpora drawn from diverse data sources.

INDUS contains two types of models; encoders and sentence transformers. Encoders convert natural language text into numeric coding that can be processed by the LLM. The INDUS encoders were trained on a corpus of 60 billion tokens encompassing astrophysics, planetary science, Earth science, heliophysics, biological, and physical sciences data. Its custom tokenizer developed by the IMPACT-IBM collaborative team improves on generic tokenizers by recognizing scientific terms like biomarkers and phosphorylated. Over half of the 50,000-word vocabulary contained in INDUS is unique to the specific scientific domains used for its training. The INDUS encoder models were used to fine tune the sentence transformer models on approximately 268 million text pairs, including titles/abstracts and questions/answers.

By providing INDUS with domain-specific vocabulary, the IMPACT-IBM team achieved superior performance over open, non-domain specific LLMs on a benchmark for biomedical tasks, a scientific question-answering benchmark, and Earth science entity recognition tests. By designing for diverse linguistic tasks and retrieval augmented generation, INDUS is able to process researcher questions, retrieve relevant documents, and generate answers to the questions. For latency sensitive applications, the team developed smaller, faster versions of both the encoder and sentence transformer models.

Validation tests demonstrate that INDUS excels in retrieving relevant passages from the science corpora in response to a NASA-curated test set of about 400 questions. IBM researcher Bishwaranjan Bhattacharjee commented on the overall approach: “We achieved superior performance by not only having a custom vocabulary but also a large specialized corpus for training the encoder model and a good training strategy. For the smaller, faster versions, we used neural architecture search to obtain a model architecture and knowledge distillation to train it with supervision of the larger model.”

NASA Chief Scientist Kate Calvin gives remarks in a NASA employee town hall on how the agency is using and developing Artificial Intelligence (AI) tools to advance missions and research, Wednesday, May 22, 2024, at the NASA Headquarters Mary W. Jackson Building in Washington.
NASA Chief Scientist Kate Calvin gives remarks in a NASA employee town hall on how the agency is using and developing Artificial Intelligence (AI) tools to advance missions and research, Wednesday, May 22, 2024, at the NASA Headquarters Mary W. Jackson Building in Washington. The INDUS suite of models will help facilitate the agency’s AI goals.
NASA/Bill Ingalls

INDUS was also evaluated using data from NASA’s Biological and Physical Sciences (BPS) Division. Dr. Sylvain Costes, the NASA BPS project manager for Open Science, discussed the benefits of incorporating INDUS: “Integrating INDUS with the Open Science Data Repository  (OSDR) Application Programming Interface (API) enabled us to develop and trial a chatbot that offers more intuitive search capabilities for navigating individual datasets. We are currently exploring ways to improve OSDR’s internal curation data system by leveraging INDUS to enhance our curation team’s productivity and reduce the manual effort required daily.”

At the NASA Goddard Earth Sciences Data and Information Services Center (GES-DISC), the INDUS model was fine-tuned using labeled data from domain experts to categorize publications specifically citing GES-DISC data into applied research areas. According to NASA principal data scientist Dr. Armin Mehrabian, this fine-tuning “significantly improves the identification and retrieval of publications that reference GES-DISC datasets, which aims to improve the user journey in finding their required datasets.” Furthermore, the INDUS encoder models are integrated into the GES-DISC knowledge graph, supporting a variety of other projects, including the dataset recommendation system and GES-DISC GraphRAG.

Kaylin Bugbee, team lead of NASA’s Science Discovery Engine (SDE), spoke to the benefit INDUS offers to existing applications: “Large language models are rapidly changing the search experience. The Science Discovery Engine, a unified, insightful search interface for all of NASA’s open science data and information, has prototyped integrating INDUS into its search engine. Initial results have shown that INDUS improved the accuracy and relevancy of the returned results.”

INDUS enhances scientific research by providing researchers with improved access to vast amounts of specialized knowledge. INDUS can understand complex scientific concepts and reveal new research directions based on existing data. It also enables researchers to extract relevant information from a wide array of sources, improving efficiency. Aligned with NASA and IBM’s commitment to open and transparent artificial intelligence, the INDUS models are openly available on Hugging Face. For the benefit of the scientific community, the team has released the developed models and will release the benchmark datasets that span named entity recognition for climate change, extractive QA for Earth science, and information retrieval for multiple domains. The INDUS encoder models are adaptable for science domain applications, and the INDUS retriever models support information retrieval in RAG applications.

A paper on INDUS, “INDUS: Effective and Efficient Language Models for Scientific Applications,” is available on arxiv.org.

Learn more about the Science Discovery Engine here.

Share

Details

Last Updated
Jun 24, 2024

Related Terms

View the full article

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Similar Topics

    • By NASA
      6 min read
      Preparations for Next Moonwalk Simulations Underway (and Underwater)
      In addition to drilling rock core samples, the science team has been grinding its way into rocks to make sense of the scientific evidence hiding just below the surface.
      NASA’s Perseverance rover uses an abrading bit to get below the surface of a rocky out-crop nicknamed “Kenmore” on June 10. The eight images that make up this video were taken approximately one minute apart by one of the rover’s front hazard-avoidance cameras. NASA/JPL-Caltech On June 3, NASA’s Perseverance Mars rover ground down a portion of a rock surface, blew away the resulting debris, and then went to work studying its pristine interior with a suite of instruments designed to determine its mineralogic makeup and geologic origin. “Kenmore,” as nicknamed by the rover science team, is the 30th Martian rock that Perseverance has subjected to such in-depth scrutiny, beginning with drilling a two-inch-wide (5-centimeter-wide) abrasion patch.  
      “Kenmore was a weird, uncooperative rock,” said Perseverance’s deputy project scientist, Ken Farley from Caltech in Pasadena, California. “Visually, it looked fine — the sort of rock we could get a good abrasion on and perhaps, if the science was right, perform a sample collection. But during abrasion, it vibrated all over the place and small chunks broke off. Fortunately, we managed to get just far enough below the surface to move forward with an analysis.”
      The science team wants to get below the weathered, dusty surface of Mars rocks to see important details about a rock’s composition and history. Grinding away an abrasion patch also creates a flat surface that enables Perseverance’s science instruments to get up close and personal with the rock.
      This close-up view of an abrasion showing distinctive “tool marks” created by the Perseverance’s abrading bit was acquired on June 5. The image was taken from approximately 2.76 inches (7 centimeters) away by the rover’s WATSON imager. NASA/JPL-Caltech/MSSS Perseverance’s gold-colored abrading bit takes center stage in this image of the rover’s drill taken by the Mastcam-Z instrument on Aug. 2, 2021, the 160th day of the mission to Mars.NASA/JPL-Caltech/ASU/MSSS Time to Grind
      NASA’s Mars Exploration Rovers, Spirit and Opportunity, each carried a diamond-dust-tipped grinder called the Rock Abrasion Tool (RAT) that spun at 3,000 revolutions per minute as the rover’s robotic arm pushed it deeper into the rock. Two wire brushes then swept the resulting debris, or tailings, out of the way. The agency’s Curiosity rover carries a Dust Removal Tool, whose wire bristles sweep dust from the rock’s surface before the rover drills into the rock. Perseverance, meanwhile, relies on a purpose-built abrading bit, and it clears the tailings with a device that surpasses wire brushes: the gaseous Dust Removal Tool, or gDRT.
      “We use Perseverance’s gDRT to fire a 12-pounds-per-square-inch (about 83 kilopascals) puff of nitrogen at the tailings and dust that cover a freshly abraded rock,” said Kyle Kaplan, a robotic engineer at NASA’s Jet Propulsion Laboratory in Southern California. “Five puffs per abrasion — one to vent the tanks and four to clear the abrasion. And gDRT has a long way to go. Since landing at Jezero Crater over four years ago, we’ve puffed 169 times. There are roughly 800 puffs remaining in the tank.” The gDRT offers a key advantage over a brushing approach: It avoids any terrestrial contaminants that might be on a brush from getting on the Martian rock being studied.
      To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video
      This video captures a test of Perseverance’s Gaseous Dust Removal Tool (gDRT) in a vacuum chamber at NASA’s Jet Propulsion Laboratory in August 2020. The tool fires puffs of nitrogen gas at the tailings and dust that cover a rock after it has been abraded by the rover.NASA/JPL-Caltech Having collected data on abraded surfaces more than 30 times, the rover team has in-situ science (studying something in its original place or position) collection pretty much down. After gDRT blows the tailings away, the rover’s WATSON (Wide Angle Topographic Sensor for Operations and eNgineering) imager (which, like gDRT, is at the end of the rover’s arm) swoops in for close-up photos. Then, from its vantage point high on the rover’s mast, SuperCam fires thousands of individual pulses from its laser, each time using a spectrometer to determine the makeup of the plume of microscopic material liberated after every zap. SuperCam also employs a different spectrometer to analyze the visible and infrared light that bounces off the materials in the abraded area.
      “SuperCam made observations in the abrasion patch and of the powdered tailings next to the patch,” said SuperCam team member and “Crater Rim” campaign science lead, Cathy Quantin-Nataf of the University of Lyon in France. “The tailings showed us that this rock contains clay minerals, which contain water as hydroxide molecules bound with iron and magnesium — relatively typical of ancient Mars clay minerals. The abrasion spectra gave us the chemical composition of the rock, showing enhancements in iron and magnesium.”
      Later, the SHERLOC (Scanning Habitable Environments with Raman & Luminescence for Organics & Chemicals) and PIXL (Planetary Instrument for X-ray Lithochemistry) instruments took a crack at Kenmore, too. Along with supporting SuperCam’s discoveries that the rock contained clay, they detected feldspar (the mineral that makes much of the Moon brilliantly bright in sunlight). The PIXL instrument also detected a manganese hydroxide mineral in the abrasion — the first time this type of material has been identified during the mission.  
      With Kenmore data collection complete, the rover headed off to new territories to explore rocks — both cooperative and uncooperative — along the rim of Jezero Crater.
      “One thing you learn early working on Mars rover missions is that not all Mars rocks are created equal,” said Farley. “The data we obtain now from rocks like Kenmore will help future missions so they don’t have to think about weird, uncooperative rocks. Instead, they’ll have a much better idea whether you can easily drive over it, sample it, separate the hydrogen and oxygen contained inside for fuel, or if it would be suitable to use as construction material for a habitat.”
      Long-Haul Roving
      On June 19 (the 1,540th Martian day, or sol, of the mission), Perseverance bested its previous record for distance traveled in a single autonomous drive, trekking 1,348 feet (411 meters). That’s about 210 feet (64 meters) more than its previous record, set on April 3, 2023 (Sol 753). While planners map out the rover’s general routes, Perseverance can cut down driving time between areas of scientific interest by using its self-driving system, AutoNav.
      “Perseverance drove 4½ football fields and could have gone even farther, but that was where the science team wanted us to stop,” said Camden Miller, a rover driver for Perseverance at JPL. “And we absolutely nailed our stop target location. Every day operating on Mars, we learn more on how to get the most out of our rover. And what we learn today future Mars missions won’t have to learn tomorrow.”
      News Media Contact
      DC Agle
      Jet Propulsion Laboratory, Pasadena, Calif.
      818-393-9011
      agle@jpl.nasa.gov
      Karen Fox / Molly Wasser
      NASA Headquarters, Washington
      202-358-1600
      karen.c.fox@nasa.gov / molly.l.wasser@nasa.gov    
      2025-082
      Share
      Details
      Last Updated Jun 25, 2025 Related Terms
      Perseverance (Rover) Jet Propulsion Laboratory Mars Explore More
      5 min read NASA’s Curiosity Mars Rover Starts Unpacking Boxwork Formations
      Article 2 days ago 4 min read NASA Mars Orbiter Captures Volcano Peeking Above Morning Cloud Tops
      Article 3 weeks ago 6 min read NASA’s Ready-to-Use Dataset Details Land Motion Across North America
      Article 3 weeks ago Keep Exploring Discover Related Topics
      Missions
      Humans in Space
      Climate Change
      Solar System
      View the full article
    • By European Space Agency
      Astronomers have discovered a huge filament of hot gas bridging four galaxy clusters. At 10 times as massive as our galaxy, the thread could contain some of the Universe’s ‘missing’ matter, addressing a decades-long mystery.
      View the full article
    • By NASA
      The book cover for the 2025 edition of the Microgravity Materials Research Researcher’s Guide June 2025 Edition
      Most materials are formed from a partially or totally fluid sample, and the transport of heat and mass from the fluid into the solid during solidification inherently influences the formation of the material and its resultant properties. The ISS provides a long-duration microgravity environment for conducting experiments that enables researchers to examine the effects of heat and mass transport on materials processes in the near-absence of gravity-driven forces. The microgravity environment greatly reduces buoyancy-driven convection, hydrostatic pressure, and sedimentation. It can also be advantageous for designing experiments with reduced container interactions. The reduction in these gravity-related sources of heat and mass transport may be taken advantage of to determine how material processes and microstructure formation are affected by gravity-driven and gravity independent sources of heat and mass transfer. 
      Materials science experiments on the ISS have yielded broad and significant scientific advancements, including contributing to the development of improved mathematical models for predicting material properties during processing on Earth and enabling a better understanding of microstructure formation during solidification towards controlling the material properties of various alloys. 
      This researcher’s guide provides information on the acceleration environment of the space station and describes facilities available for materials research. Examples of previous microgravity materials research and descriptions of planned research are also provided.
      PDF readers: PDF [4.3 MB]
      Keep Exploring Discover More Topics
      Station Researcher’s Guide Series
      Opportunities and Information for Researchers
      Space Station Research Results
      Latest News from Space Station Research
      View the full article
    • By USH
      The photograph was captured by the Mast Camera (Mastcam) aboard NASA’s Curiosity rover on Sol 3551 (August 2, 2022, at 20:43:28 UTC). 

      What stands out in the image are two objects, that appear strikingly out of place amid the natural Martian landscape of rocks and boulders. Their sharp edges, right angles, flat surfaces, and geometric symmetry suggest they may have been shaped by advanced cutting tools rather than natural erosion. 

      Could these ancient remnants be part of a destroyed structure or sculpture? If so, they may serve as yet another piece of evidence pointing to the possibility that Mars was once home to an intelligent civilization, perhaps even the advanced humanoid beings who, according to some theories, fled the catastrophic destruction of planet Maldek and sought refuge on the Red Planet. 
      Objects discovered by Jean Ward Watch Jean Ward's YouTube video on this topic: HereSee original NASA source: Here 
      View the full article
    • By NASA
      At COSI’s Big Science Celebration on Sunday, May 4, 2025, a young visitor uses one of NASA Glenn Research Center’s virtual reality headsets to immerse herself in a virtual environment. Credit: NASA/Lily Hammel  NASA’s Glenn Research Center joined the Center for Science and Industry (COSI) Big Science Celebration on the museum’s front lawn in Columbus, Ohio, on May 4. This event centered on science activities by STEM professionals, researchers, and experts from Central Ohio — and despite chilly, damp weather, it drew more than 20,000 visitors. 
      At COSI’s Big Science Celebration on Sunday, May 4, 2025, a young visitor steps out of the rain and into NASA Glenn Research Center’s booth to check out the Graphics and Visualization Lab’s augmented reality fluid flow table that allows users to virtually explore a model of the International Space Station. Credit: NASA/Lily Hammel  NASA’s 10-by-80-foot tent housed a variety of information booths and hands-on demonstrations to introduce guests to the vital research being performed at the Cleveland center. Popular attractions included a mini wind tunnel and multiple augmented and virtual reality demonstrations. Visitors also engaged through tangram puzzles and a cosmic selfie station. NASA Glenn’s astronaut mascot made several appearances to the delight of young and old alike.   
      Return to Newsletter View the full article
  • Check out these Videos

×
×
  • Create New...