Jump to content

NASA-IBM Collaboration Develops INDUS Large Language Models for Advanced Science Research


Recommended Posts

  • Publishers
Posted

4 min read

NASA-IBM Collaboration Develops INDUS Large Language Models for Advanced Science Research

Five orange stars connected in a V-like shape with blue lines, like a diagram of the constellation of Indus. Each of the stars is labeled with one of the NASA Science Mission Directorate divisions: astrophysics, Earth science, heliophysics, planetary science, and biological and physical sciences.
Named for the southern sky constellation, INDUS (stylized in all caps) is a comprehensive suite of large language models supporting five science domains.
NASA

By Derek Koehl

Collaborations with private, non-federal partners through Space Act Agreements are a key component in the work done by NASA’s Interagency Implementation and Advanced Concepts Team (IMPACT). A collaboration with International Business Machines (IBM) has produced INDUS, a comprehensive suite of large language models (LLMs) tailored for the domains of Earth science, biological and physical sciences, heliophysics, planetary sciences, and astrophysics and trained using curated scientific corpora drawn from diverse data sources.

INDUS contains two types of models; encoders and sentence transformers. Encoders convert natural language text into numeric coding that can be processed by the LLM. The INDUS encoders were trained on a corpus of 60 billion tokens encompassing astrophysics, planetary science, Earth science, heliophysics, biological, and physical sciences data. Its custom tokenizer developed by the IMPACT-IBM collaborative team improves on generic tokenizers by recognizing scientific terms like biomarkers and phosphorylated. Over half of the 50,000-word vocabulary contained in INDUS is unique to the specific scientific domains used for its training. The INDUS encoder models were used to fine tune the sentence transformer models on approximately 268 million text pairs, including titles/abstracts and questions/answers.

By providing INDUS with domain-specific vocabulary, the IMPACT-IBM team achieved superior performance over open, non-domain specific LLMs on a benchmark for biomedical tasks, a scientific question-answering benchmark, and Earth science entity recognition tests. By designing for diverse linguistic tasks and retrieval augmented generation, INDUS is able to process researcher questions, retrieve relevant documents, and generate answers to the questions. For latency sensitive applications, the team developed smaller, faster versions of both the encoder and sentence transformer models.

Validation tests demonstrate that INDUS excels in retrieving relevant passages from the science corpora in response to a NASA-curated test set of about 400 questions. IBM researcher Bishwaranjan Bhattacharjee commented on the overall approach: “We achieved superior performance by not only having a custom vocabulary but also a large specialized corpus for training the encoder model and a good training strategy. For the smaller, faster versions, we used neural architecture search to obtain a model architecture and knowledge distillation to train it with supervision of the larger model.”

NASA Chief Scientist Kate Calvin gives remarks in a NASA employee town hall on how the agency is using and developing Artificial Intelligence (AI) tools to advance missions and research, Wednesday, May 22, 2024, at the NASA Headquarters Mary W. Jackson Building in Washington.
NASA Chief Scientist Kate Calvin gives remarks in a NASA employee town hall on how the agency is using and developing Artificial Intelligence (AI) tools to advance missions and research, Wednesday, May 22, 2024, at the NASA Headquarters Mary W. Jackson Building in Washington. The INDUS suite of models will help facilitate the agency’s AI goals.
NASA/Bill Ingalls

INDUS was also evaluated using data from NASA’s Biological and Physical Sciences (BPS) Division. Dr. Sylvain Costes, the NASA BPS project manager for Open Science, discussed the benefits of incorporating INDUS: “Integrating INDUS with the Open Science Data Repository  (OSDR) Application Programming Interface (API) enabled us to develop and trial a chatbot that offers more intuitive search capabilities for navigating individual datasets. We are currently exploring ways to improve OSDR’s internal curation data system by leveraging INDUS to enhance our curation team’s productivity and reduce the manual effort required daily.”

At the NASA Goddard Earth Sciences Data and Information Services Center (GES-DISC), the INDUS model was fine-tuned using labeled data from domain experts to categorize publications specifically citing GES-DISC data into applied research areas. According to NASA principal data scientist Dr. Armin Mehrabian, this fine-tuning “significantly improves the identification and retrieval of publications that reference GES-DISC datasets, which aims to improve the user journey in finding their required datasets.” Furthermore, the INDUS encoder models are integrated into the GES-DISC knowledge graph, supporting a variety of other projects, including the dataset recommendation system and GES-DISC GraphRAG.

Kaylin Bugbee, team lead of NASA’s Science Discovery Engine (SDE), spoke to the benefit INDUS offers to existing applications: “Large language models are rapidly changing the search experience. The Science Discovery Engine, a unified, insightful search interface for all of NASA’s open science data and information, has prototyped integrating INDUS into its search engine. Initial results have shown that INDUS improved the accuracy and relevancy of the returned results.”

INDUS enhances scientific research by providing researchers with improved access to vast amounts of specialized knowledge. INDUS can understand complex scientific concepts and reveal new research directions based on existing data. It also enables researchers to extract relevant information from a wide array of sources, improving efficiency. Aligned with NASA and IBM’s commitment to open and transparent artificial intelligence, the INDUS models are openly available on Hugging Face. For the benefit of the scientific community, the team has released the developed models and will release the benchmark datasets that span named entity recognition for climate change, extractive QA for Earth science, and information retrieval for multiple domains. The INDUS encoder models are adaptable for science domain applications, and the INDUS retriever models support information retrieval in RAG applications.

A paper on INDUS, “INDUS: Effective and Efficient Language Models for Scientific Applications,” is available on arxiv.org.

Learn more about the Science Discovery Engine here.

Share

Details

Last Updated
Jun 24, 2024

Related Terms

View the full article

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Similar Topics

    • By NASA
      Credit: NASA/Krystofer Kim Lee esta nota en español aquí.
      NASA released the first episode Tuesday of its third season of Universo curioso de la NASA, the agency’s only Spanish-language podcast.
      Episodes focus on some of NASA’s top missions and research topics for 2025, bringing the wonder of exploration, space technology, and scientific discoveries to Spanish-speaking audiences around the world. 
      “NASA Science is literally everywhere, transcending geography and language to provide real time benefits to everyday lives across the globe using our scientific innovations, data, and discoveries from the unique vantage point of space,” said Dr. Nicky Fox, associate administrator, Science Mission Directorate, at NASA Headquarters in Washington. “The Universo curioso de la NASA podcast shares NASA’s discoveries with Spanish-speaking communities across the globe, inspiring future explorers to join our journey as we return to the Moon and venture onward to Mars for the benefit of all humanity.”


      New episodes will post every month through the end of the year. The first episode, centered on the science objectives of NASA’s Artemis II mission to the Moon, is available at:
      https://go.nasa.gov/4l9lmbN

      Universo curioso is hosted by Noelia González, communications specialist at NASA’s Goddard Space Flight Center in Greenbelt, Maryland. This season introduces co-host Andrés Almeida, technical writer and host of NASA’s Small Steps, Giant Leaps podcast at NASA’s Headquarters. Throughout the season, listeners will celebrate the legacy of NASA’s Hubble Space Telescope, learn about an upcoming mission to the Sun, and explore dark energy and how the future Roman Space Telescope will study it, among other topics.

      Universo curioso de la NASA is a joint initiative of the agency’s Spanish-language communications and audio programs. The new season, as well as previous episodes, are available on Apple Podcasts, Spotify, SoundCloud and NASA’s website.

      Listen to the podcast and download related art materials at:
      https://ciencia.nasa.gov/universocurioso
      Share
      Details
      Last Updated Jul 01, 2025 EditorJessica TaveauLocationNASA Headquarters Related Terms
      Podcasts General View the full article
    • By NASA
      The NASA Ames Science Directorate recognizes the outstanding contributions of (pictured left to right) Sigrid Reinsch, Lori Munar, Kevin Sims, and Matthew Fladeland. Their commitment to the NASA mission represents the entrepreneurial spirit, technical expertise, and collaborative disposition needed to explore this world and beyond.
      Space Biosciences Star: Sigrid Reinsch
      As Director of the SHINE (Space Health Impacts for the NASA Experience) program and Project Scientist for NBISC (NASA Biological Institutional Scientific Collection), Sigrid Reinsch is a high-performing scientist and outstanding mentor in the Space Biosciences Research Branch. Her dedication to student training and her efforts to streamline processes have significantly improved the experience of welcoming summer interns at NASA Ames.

      Space Science and Astrobiology Star: Lori Munar
      Lori Munar serves as the assistant Branch Chief of the Exobiology Branch. In the past few months, she has gone above and beyond to organize a facility and laboratory surplus event that involved multiple divisions over multiple days. The event resulted in considerable savings across the groups involved and improved the safety of N239 staff and the appearance of offices and labs.
      Space Science and Astrobiology Star: Kevin Sims
      Kevin Sims is a NASA Technical Project Manager serving the Astrophysics Branch as a member of the Flight Systems Implementation Branch in the Space Biosciences Division. Kevin is recognized for outstanding project management for exoplanet imaging instrumentation development in support of the Habitable Worlds Observatory. Kevin has streamlined, organized, and improved the efficiency of the Ames Photonics Testbed being developed as part the AstroPIC Early Career Initiative project.
      Earth Science Star: Matthew Fladeland
      Matthew Fladeland is a research scientist in the Earth Science Division managing NASA SMD’s Program Office for the Airborne Science Program, located at Ames. He is recognized for exemplary leadership and teamwork leading to new reimbursable agreements with the Department of Defense, for accelerating science technology solutions through the SBIR program, and for advancing partnerships with the US Forest Service on wildland ecology and fire science.
      View the full article
    • By NASA
      6 min read
      Preparations for Next Moonwalk Simulations Underway (and Underwater)
      In addition to drilling rock core samples, the science team has been grinding its way into rocks to make sense of the scientific evidence hiding just below the surface.
      NASA’s Perseverance rover uses an abrading bit to get below the surface of a rocky out-crop nicknamed “Kenmore” on June 10. The eight images that make up this video were taken approximately one minute apart by one of the rover’s front hazard-avoidance cameras. NASA/JPL-Caltech On June 3, NASA’s Perseverance Mars rover ground down a portion of a rock surface, blew away the resulting debris, and then went to work studying its pristine interior with a suite of instruments designed to determine its mineralogic makeup and geologic origin. “Kenmore,” as nicknamed by the rover science team, is the 30th Martian rock that Perseverance has subjected to such in-depth scrutiny, beginning with drilling a two-inch-wide (5-centimeter-wide) abrasion patch.  
      “Kenmore was a weird, uncooperative rock,” said Perseverance’s deputy project scientist, Ken Farley from Caltech in Pasadena, California. “Visually, it looked fine — the sort of rock we could get a good abrasion on and perhaps, if the science was right, perform a sample collection. But during abrasion, it vibrated all over the place and small chunks broke off. Fortunately, we managed to get just far enough below the surface to move forward with an analysis.”
      The science team wants to get below the weathered, dusty surface of Mars rocks to see important details about a rock’s composition and history. Grinding away an abrasion patch also creates a flat surface that enables Perseverance’s science instruments to get up close and personal with the rock.
      This close-up view of an abrasion showing distinctive “tool marks” created by the Perseverance’s abrading bit was acquired on June 5. The image was taken from approximately 2.76 inches (7 centimeters) away by the rover’s WATSON imager. NASA/JPL-Caltech/MSSS Perseverance’s gold-colored abrading bit takes center stage in this image of the rover’s drill taken by the Mastcam-Z instrument on Aug. 2, 2021, the 160th day of the mission to Mars.NASA/JPL-Caltech/ASU/MSSS Time to Grind
      NASA’s Mars Exploration Rovers, Spirit and Opportunity, each carried a diamond-dust-tipped grinder called the Rock Abrasion Tool (RAT) that spun at 3,000 revolutions per minute as the rover’s robotic arm pushed it deeper into the rock. Two wire brushes then swept the resulting debris, or tailings, out of the way. The agency’s Curiosity rover carries a Dust Removal Tool, whose wire bristles sweep dust from the rock’s surface before the rover drills into the rock. Perseverance, meanwhile, relies on a purpose-built abrading bit, and it clears the tailings with a device that surpasses wire brushes: the gaseous Dust Removal Tool, or gDRT.
      “We use Perseverance’s gDRT to fire a 12-pounds-per-square-inch (about 83 kilopascals) puff of nitrogen at the tailings and dust that cover a freshly abraded rock,” said Kyle Kaplan, a robotic engineer at NASA’s Jet Propulsion Laboratory in Southern California. “Five puffs per abrasion — one to vent the tanks and four to clear the abrasion. And gDRT has a long way to go. Since landing at Jezero Crater over four years ago, we’ve puffed 169 times. There are roughly 800 puffs remaining in the tank.” The gDRT offers a key advantage over a brushing approach: It avoids any terrestrial contaminants that might be on a brush from getting on the Martian rock being studied.
      To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video
      This video captures a test of Perseverance’s Gaseous Dust Removal Tool (gDRT) in a vacuum chamber at NASA’s Jet Propulsion Laboratory in August 2020. The tool fires puffs of nitrogen gas at the tailings and dust that cover a rock after it has been abraded by the rover.NASA/JPL-Caltech Having collected data on abraded surfaces more than 30 times, the rover team has in-situ science (studying something in its original place or position) collection pretty much down. After gDRT blows the tailings away, the rover’s WATSON (Wide Angle Topographic Sensor for Operations and eNgineering) imager (which, like gDRT, is at the end of the rover’s arm) swoops in for close-up photos. Then, from its vantage point high on the rover’s mast, SuperCam fires thousands of individual pulses from its laser, each time using a spectrometer to determine the makeup of the plume of microscopic material liberated after every zap. SuperCam also employs a different spectrometer to analyze the visible and infrared light that bounces off the materials in the abraded area.
      “SuperCam made observations in the abrasion patch and of the powdered tailings next to the patch,” said SuperCam team member and “Crater Rim” campaign science lead, Cathy Quantin-Nataf of the University of Lyon in France. “The tailings showed us that this rock contains clay minerals, which contain water as hydroxide molecules bound with iron and magnesium — relatively typical of ancient Mars clay minerals. The abrasion spectra gave us the chemical composition of the rock, showing enhancements in iron and magnesium.”
      Later, the SHERLOC (Scanning Habitable Environments with Raman & Luminescence for Organics & Chemicals) and PIXL (Planetary Instrument for X-ray Lithochemistry) instruments took a crack at Kenmore, too. Along with supporting SuperCam’s discoveries that the rock contained clay, they detected feldspar (the mineral that makes much of the Moon brilliantly bright in sunlight). The PIXL instrument also detected a manganese hydroxide mineral in the abrasion — the first time this type of material has been identified during the mission.  
      With Kenmore data collection complete, the rover headed off to new territories to explore rocks — both cooperative and uncooperative — along the rim of Jezero Crater.
      “One thing you learn early working on Mars rover missions is that not all Mars rocks are created equal,” said Farley. “The data we obtain now from rocks like Kenmore will help future missions so they don’t have to think about weird, uncooperative rocks. Instead, they’ll have a much better idea whether you can easily drive over it, sample it, separate the hydrogen and oxygen contained inside for fuel, or if it would be suitable to use as construction material for a habitat.”
      Long-Haul Roving
      On June 19 (the 1,540th Martian day, or sol, of the mission), Perseverance bested its previous record for distance traveled in a single autonomous drive, trekking 1,348 feet (411 meters). That’s about 210 feet (64 meters) more than its previous record, set on April 3, 2023 (Sol 753). While planners map out the rover’s general routes, Perseverance can cut down driving time between areas of scientific interest by using its self-driving system, AutoNav.
      “Perseverance drove 4½ football fields and could have gone even farther, but that was where the science team wanted us to stop,” said Camden Miller, a rover driver for Perseverance at JPL. “And we absolutely nailed our stop target location. Every day operating on Mars, we learn more on how to get the most out of our rover. And what we learn today future Mars missions won’t have to learn tomorrow.”
      News Media Contact
      DC Agle
      Jet Propulsion Laboratory, Pasadena, Calif.
      818-393-9011
      agle@jpl.nasa.gov
      Karen Fox / Molly Wasser
      NASA Headquarters, Washington
      202-358-1600
      karen.c.fox@nasa.gov / molly.l.wasser@nasa.gov    
      2025-082
      Share
      Details
      Last Updated Jun 25, 2025 Related Terms
      Perseverance (Rover) Jet Propulsion Laboratory Mars Explore More
      5 min read NASA’s Curiosity Mars Rover Starts Unpacking Boxwork Formations
      Article 2 days ago 4 min read NASA Mars Orbiter Captures Volcano Peeking Above Morning Cloud Tops
      Article 3 weeks ago 6 min read NASA’s Ready-to-Use Dataset Details Land Motion Across North America
      Article 3 weeks ago Keep Exploring Discover Related Topics
      Missions
      Humans in Space
      Climate Change
      Solar System
      View the full article
    • By European Space Agency
      Astronomers have discovered a huge filament of hot gas bridging four galaxy clusters. At 10 times as massive as our galaxy, the thread could contain some of the Universe’s ‘missing’ matter, addressing a decades-long mystery.
      View the full article
    • By NASA
      The book cover for the 2025 edition of the Microgravity Materials Research Researcher’s Guide June 2025 Edition
      Most materials are formed from a partially or totally fluid sample, and the transport of heat and mass from the fluid into the solid during solidification inherently influences the formation of the material and its resultant properties. The ISS provides a long-duration microgravity environment for conducting experiments that enables researchers to examine the effects of heat and mass transport on materials processes in the near-absence of gravity-driven forces. The microgravity environment greatly reduces buoyancy-driven convection, hydrostatic pressure, and sedimentation. It can also be advantageous for designing experiments with reduced container interactions. The reduction in these gravity-related sources of heat and mass transport may be taken advantage of to determine how material processes and microstructure formation are affected by gravity-driven and gravity independent sources of heat and mass transfer. 
      Materials science experiments on the ISS have yielded broad and significant scientific advancements, including contributing to the development of improved mathematical models for predicting material properties during processing on Earth and enabling a better understanding of microstructure formation during solidification towards controlling the material properties of various alloys. 
      This researcher’s guide provides information on the acceleration environment of the space station and describes facilities available for materials research. Examples of previous microgravity materials research and descriptions of planned research are also provided.
      PDF readers: PDF [4.3 MB]
      Keep Exploring Discover More Topics
      Station Researcher’s Guide Series
      Opportunities and Information for Researchers
      Space Station Research Results
      Latest News from Space Station Research
      View the full article
  • Check out these Videos

×
×
  • Create New...