Jump to content

NASA-IBM Collaboration Develops INDUS Large Language Models for Advanced Science Research


NASA

Recommended Posts

  • Publishers

4 min read

NASA-IBM Collaboration Develops INDUS Large Language Models for Advanced Science Research

Five orange stars connected in a V-like shape with blue lines, like a diagram of the constellation of Indus. Each of the stars is labeled with one of the NASA Science Mission Directorate divisions: astrophysics, Earth science, heliophysics, planetary science, and biological and physical sciences.
Named for the southern sky constellation, INDUS (stylized in all caps) is a comprehensive suite of large language models supporting five science domains.
NASA

By Derek Koehl

Collaborations with private, non-federal partners through Space Act Agreements are a key component in the work done by NASA’s Interagency Implementation and Advanced Concepts Team (IMPACT). A collaboration with International Business Machines (IBM) has produced INDUS, a comprehensive suite of large language models (LLMs) tailored for the domains of Earth science, biological and physical sciences, heliophysics, planetary sciences, and astrophysics and trained using curated scientific corpora drawn from diverse data sources.

INDUS contains two types of models; encoders and sentence transformers. Encoders convert natural language text into numeric coding that can be processed by the LLM. The INDUS encoders were trained on a corpus of 60 billion tokens encompassing astrophysics, planetary science, Earth science, heliophysics, biological, and physical sciences data. Its custom tokenizer developed by the IMPACT-IBM collaborative team improves on generic tokenizers by recognizing scientific terms like biomarkers and phosphorylated. Over half of the 50,000-word vocabulary contained in INDUS is unique to the specific scientific domains used for its training. The INDUS encoder models were used to fine tune the sentence transformer models on approximately 268 million text pairs, including titles/abstracts and questions/answers.

By providing INDUS with domain-specific vocabulary, the IMPACT-IBM team achieved superior performance over open, non-domain specific LLMs on a benchmark for biomedical tasks, a scientific question-answering benchmark, and Earth science entity recognition tests. By designing for diverse linguistic tasks and retrieval augmented generation, INDUS is able to process researcher questions, retrieve relevant documents, and generate answers to the questions. For latency sensitive applications, the team developed smaller, faster versions of both the encoder and sentence transformer models.

Validation tests demonstrate that INDUS excels in retrieving relevant passages from the science corpora in response to a NASA-curated test set of about 400 questions. IBM researcher Bishwaranjan Bhattacharjee commented on the overall approach: “We achieved superior performance by not only having a custom vocabulary but also a large specialized corpus for training the encoder model and a good training strategy. For the smaller, faster versions, we used neural architecture search to obtain a model architecture and knowledge distillation to train it with supervision of the larger model.”

NASA Chief Scientist Kate Calvin gives remarks in a NASA employee town hall on how the agency is using and developing Artificial Intelligence (AI) tools to advance missions and research, Wednesday, May 22, 2024, at the NASA Headquarters Mary W. Jackson Building in Washington.
NASA Chief Scientist Kate Calvin gives remarks in a NASA employee town hall on how the agency is using and developing Artificial Intelligence (AI) tools to advance missions and research, Wednesday, May 22, 2024, at the NASA Headquarters Mary W. Jackson Building in Washington. The INDUS suite of models will help facilitate the agency’s AI goals.
NASA/Bill Ingalls

INDUS was also evaluated using data from NASA’s Biological and Physical Sciences (BPS) Division. Dr. Sylvain Costes, the NASA BPS project manager for Open Science, discussed the benefits of incorporating INDUS: “Integrating INDUS with the Open Science Data Repository  (OSDR) Application Programming Interface (API) enabled us to develop and trial a chatbot that offers more intuitive search capabilities for navigating individual datasets. We are currently exploring ways to improve OSDR’s internal curation data system by leveraging INDUS to enhance our curation team’s productivity and reduce the manual effort required daily.”

At the NASA Goddard Earth Sciences Data and Information Services Center (GES-DISC), the INDUS model was fine-tuned using labeled data from domain experts to categorize publications specifically citing GES-DISC data into applied research areas. According to NASA principal data scientist Dr. Armin Mehrabian, this fine-tuning “significantly improves the identification and retrieval of publications that reference GES-DISC datasets, which aims to improve the user journey in finding their required datasets.” Furthermore, the INDUS encoder models are integrated into the GES-DISC knowledge graph, supporting a variety of other projects, including the dataset recommendation system and GES-DISC GraphRAG.

Kaylin Bugbee, team lead of NASA’s Science Discovery Engine (SDE), spoke to the benefit INDUS offers to existing applications: “Large language models are rapidly changing the search experience. The Science Discovery Engine, a unified, insightful search interface for all of NASA’s open science data and information, has prototyped integrating INDUS into its search engine. Initial results have shown that INDUS improved the accuracy and relevancy of the returned results.”

INDUS enhances scientific research by providing researchers with improved access to vast amounts of specialized knowledge. INDUS can understand complex scientific concepts and reveal new research directions based on existing data. It also enables researchers to extract relevant information from a wide array of sources, improving efficiency. Aligned with NASA and IBM’s commitment to open and transparent artificial intelligence, the INDUS models are openly available on Hugging Face. For the benefit of the scientific community, the team has released the developed models and will release the benchmark datasets that span named entity recognition for climate change, extractive QA for Earth science, and information retrieval for multiple domains. The INDUS encoder models are adaptable for science domain applications, and the INDUS retriever models support information retrieval in RAG applications.

A paper on INDUS, “INDUS: Effective and Efficient Language Models for Scientific Applications,” is available on arxiv.org.

Learn more about the Science Discovery Engine here.

Share

Details

Last Updated
Jun 24, 2024

Related Terms

View the full article

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Similar Topics

    • By NASA
      Credit: NASA NASA has awarded the Center, Operations Maintenance, and Engineering II contract to Jacobs Technology Inc. of Tullahoma, Tennessee, to support operations at the agency’s Langley Research Center in Hampton, Virginia.
      The contract is a cost-plus-fixed-fee indefinite-delivery/indefinite-quantity contract with a maximum potential value of $973.7 million. Following a phase-in period that starts Tuesday, Oct. 1 and runs to Dec. 31, the contract will have a base period of 15 months followed by five optional periods that could extend the contract to the end of 2035.
      Under this contract, Jacobs Technology will assist in crucial research operations, engineering, and maintenance services at NASA Langley to help the center continue its work to solve the mysteries of our home planet, solar system, and beyond. The firm also will provide institutional and research operations support, maintenance and engineering for the center’s facilities, and central utilities operations, among other services.
      For information about NASA and agency programs, visit:
      https://www.nasa.gov
      -end-
      Tiernan Doyle
      Headquarters, Washington
      202-358-1600
      tiernan.doyle@nasa.gov
      Share
      Details
      Last Updated Sep 05, 2024 LocationNASA Headquarters Related Terms
      Langley Research Center NASA Centers & Facilities View the full article
    • By NASA
      3 min read
      Preparations for Next Moonwalk Simulations Underway (and Underwater)
      A fisheye lens attached to an electronic still camera was used to capture this image of NASA astronaut Don Pettit.NASA Science ideas are everywhere. Some of the greatest discoveries have come from tinkering and toying with new concepts and ideas. NASA astronaut Don Pettit is no stranger to inventing and discovering. During his previous missions, Pettit has contributed to advancements for human space exploration aboard the International Space Station resulting in several published scientific papers and breakthroughs.

      Pettit, accompanied by cosmonauts Alexey Ovchinin and Ivan Vagner, will launch to the orbiting laboratory in September 2024. In preparation for his fourth spaceflight, read about previous “science of opportunity” experiments Pettit performed during his free time with materials readily available to the crew or included in his personal kit.

      Freezing Ice in Space
      Thin ice under polarized light frozen aboard the International Space Station.NASA Have you ever noticed a white bubble inside the ice in your ice tray at home? This is trapped air that accumulates in one area due to gravity. Pettit took this knowledge, access to a -90° Celsius freezer aboard the space station, and an open weekend to figure out how water freezes in microgravity compared to on Earth. This photo uses polarized light to show thin frozen water and the visible differences from the ice we typically freeze here on Earth, providing more insight into physics concepts in microgravity.

      Space Cup
      NASA astronaut Don Pettit demonstrates how surface tension, wetting, and container shape hold coffee in the space cup.NASA Microgravity affects even the most mundane tasks, like sipping your morning tea. Typically, crews drink beverages from a specially sealed bag with a straw. Using an overhead transparency film, Pettit invented the prototype of the Capillary Beverage, or Space Cup. The cup uses surface tension, wetting, and container shape to mimic the role of gravity in drinking on Earth, making drinking beverages in space easier to consume and showing how discoveries aboard station can be used to design new systems.
      Planetary Formation
      To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video
      Astronaut Don Pettit demonstrates a mixture of coffee grounds and sugar sticking together in microgravity to understand planetary formation. NASA Using materials that break into very small particles, such as table salt, sugar, and coffee, Pettit experimented to understand planetary formation. A crucial early step in planet formation is the aggregation or clumping of tiny particles, but scientists do not fully understand this process. Pettit placed different particulate mixtures in plastic bags, filled them with air, thoroughly shook the bags, and observed that the particles clumped within seconds due to what appears to be an electrostatic process. Studying the behavior of tiny particles in microgravity may provide valuable insight into how material composition, density, and turbulence play a role in planetary formation.
      Orbital Motion
      Charged water particles orbit a knitting needle, showing electrostatic processes in space. NASA Knitting needles made of different materials arrived aboard station as personal crew items. Pettit electrically charged the needles by rubbing each one with paper. Then, he released charged water from a Teflon syringe and observed the water droplets orbit the knitting needle, demonstrating electrostatic orbits in microgravity. The study was later repeated in a simulation that included atmospheric drag, and the 3D motion accurately matched the orbits seen in the space station demonstration. These observations could be analogous to the behavior of charged particles in Earth’s magnetic field and prove useful in designing future spacecraft systems.
      Astrophotography
      Top: NASA astronaut Don Pettit photographed in the International Space Station cupola surrounded by cameras. Bottom: Star trails photographed by NASA astronaut Don Pettit in March of 2012.NASA An innovative photographer, Pettit has used time exposure, multiple cameras, infrared, and other techniques to contribute breathtaking images of Earth and star trails from the space station’s unique viewpoint. These photos contribute to a database researchers use to understand Earth’s changing landscapes, and this imagery can inspire the public’s interest in human spaceflight.

      Christine Giraldo
      International Space Station Research Communications Team
      NASA’s Johnson Space Center
      Keep Exploring Discover More Topics
      Missions
      Humans in Space
      Climate Change
      Solar System
      View the full article
    • By NASA
      4 min read
      Preparations for Next Moonwalk Simulations Underway (and Underwater)
      The Dash 7 aircraft that will be modified into a hybrid electric research vehicle under NASA’s Electrified Powertrain Flight Demonstration project is seen taking off from Moses Lake, Washington en route to Seattle for a ceremony unveiling its new livery. The aircraft is currently operating with a traditional fuel-based propulsion system but will eventually be modified with a hybrid electric system. NASA / David C. Bowman Parked under the lights inside a hangar in Seattle, a hybrid electric research aircraft from electric motor manufacturer magniX showed off a new look symbolizing its journey toward helping NASA make sustainable aviation a reality.  
      During a special unveiling ceremony hosted by magniX on Aug. 22, leaders from the company and NASA revealed the aircraft, with its new livery, to the public for the first time at King County International Airport, commonly known as Boeing Field.  
      The aircraft is a De Havilland Dash 7 that was formerly used for carrying cargo. Working under NASA’s Electrified Powertrain Flight Demonstration (EPFD) project, magniX will modify it to serve as a testbed for hybrid electric aircraft propulsion research.    
      The company’s goal under EPFD is to demonstrate potential fuel savings and performance boosts with a hybrid electric system for regional aircraft carrying up to 50 passengers. These efforts will help reduce environmental impacts from aviation by lowering greenhouse gas emissions. 
      This livery recognizes the collaborative effort focused on proving that hybrid electric flight for commercial aircraft is feasible. 
      “We are a research organization that continues to advance aviation, solve the problems of flight, and lead the community into the future,” said Robert A. Pearce, associate administrator for NASA’s Aeronautics Research Mission Directorate. “Through our EPFD project, we’re taking big steps in partnership to make sure electric aviation is part of the future of commercial flight.” 
      Lee Noble, director for NASA’s Integrated Aviation Systems Program (right) and Robert Pearce, associate administrator for NASA’s Aeronautics Research Mission Directorate (middle) chat with an AeroTEC test pilot for the Dash 7. Battery packs are stored along the floor of the cabin for magniX’s hybrid electric flight demonstrationsNASA / David C. Bowman Collaborative Effort   
      NASA is collaborating with industry to modify existing planes with new electrified aircraft propulsion systems. These aircraft testbeds will help demonstrate the benefits of hybrid electric propulsion systems in reducing fuel burn and emissions for future commercial aircraft, part of NASA’s broader mission to make air travel more sustainable.  
      “EPFD is about showing how regional-scale aircraft, through ground and flight tests, can be made more sustainable through electric technology that is available right now,” said Ben Loxton, vice president for magniX’s work on the EPFD project.  
      Thus far, magniX has focused on developing a battery-powered engine and testing it on the ground to make sure it will be safe for work in the air. The company will now begin transitioning over to a new phase of the project — transforming the Dash 7 into a hybrid electric research vehicle.  
      “With the recent completion of our preliminary design review and baseline flight tests, this marks a transition to the next phase, and the most exciting phase of the project: the modification of this Dash 7 with our magniX electric powertrain,” Loxton said.  
      To make this possible, magniX is working with their airframe integrator AeroTEC to help modify and prepare the aircraft for flight tests that will take place out of Moses Lake, Washington. Air Tindi, which supplied the aircraft to magniX for this project, will help with maintenance and support of the aircraft during the testing phases.  
      The Dash 7 that will be modified into a hybrid electric research vehicle under NASA’s Electrified Powertrain Flight Demonstration project on display with its new livery for the first time. In front of the plane is an electric powertrain that magniX will integrate into the current aircraft to build a hybrid electric propulsion system.NASA/David C. Bowman Creating a Hybrid Electric Aircraft   
      A typical hybrid electric propulsion system combines different sources of energy, such as fuel and electricity, to power an aircraft. For magniX’s demonstration, the modified Dash 7 will feature two electric engines fed by battery packs stored in the cabin, and two gas-powered turboprops.  
      The work will begin with replacing one of the aircraft’s outer turboprop engines with a new, magni650-kilowatt electric engine – the base of its hybrid electric system. After testing those modifications, magniX will swap out the remaining outer turboprop engine for an additional electric one. 
      Earlier this year, magniX and NASA marked the milestone completion of successfully testing the battery-powered engine at simulated altitude. Engineers at magniX are continuing ground tests of the aircraft’s electrified systems and components at NASA’s Electric Aircraft Testbed (NEAT) facility in Sandusky, Ohio.  
      By rigorously testing these new technologies under simulated flight conditions, such as high altitudes and extreme temperatures, researchers can ensure each component operates safely before taking to the skies. 
      The collaboration between EPFD, NASA, GE Aerospace, and magniX works to advance hybrid electric aircraft propulsion technologies for next-generation commercial aircraft in the mid-2030 timeframe. NASA is working with these companies to conduct two flight demonstrations showcasing different approaches to hybrid electric system design. 
      Researchers will use data gathered from ground and flight tests to identify and reduce certification gaps, as well as inform the development of new standards and regulations for future electrified aircraft. 
      “We at NASA are excited about EPFD’s potential to make aviation more sustainable,” Pearce said. “Hybrid electric propulsion on a megawatt scale accelerates U.S. progress toward its goal of net-zero greenhouse gas emissions by 2050, benefitting all who rely on air transportation every day.”
      Facebook logo @NASA@NASAaero@NASA_es @NASA@NASAaero@NASA_es Instagram logo @NASA@NASAaero@NASA_es Linkedin logo @NASA Explore More
      2 min read NASA G-IV Plane Will Carry Next-Generation Science Instrument
      Article 6 days ago 2 min read NASA Develops Pod to Help Autonomous Aircraft Operators 
      Article 1 week ago 2 min read NASA Composite Manufacturing Initiative Gains Two New Members
      Article 2 weeks ago Keep Exploring Discover More Topics From NASA
      Missions
      Artemis
      Aeronautics STEM
      Explore NASA’s History
      Share
      Details
      Last Updated Sep 03, 2024 EditorJim BankeContactMichael Jorgensen Related Terms
      Aeronautics Aeronautics Research Mission Directorate Electrified Powertrain Flight Demo Glenn Research Center Green Aviation Tech Integrated Aviation Systems Program View the full article
    • By NASA
      Learn Home NASA Earth Science Education… Earth Science Overview Learning Resources Science Activation Teams SME Map Opportunities More Science Stories Science Activation Highlights Citizen Science   2 min read
      NASA Earth Science Education Collaborative Member Co-Authors Award-Winning Paper in Insects
      On August 13, 2024, the publishers of the journal Insects notified authors of three papers selected to receive “Insects 2022 Best Paper Award” for research and review articles published in Insects from January 1 to December 31, 2022.
      One of the winning papers was co-authored by Russanne Low, PhD, Institute for Global Environmental Strategies (IGES). Low is a member of the NASA Earth Science Education Collaborative (NESEC), a NASA Science Activation project, and science lead for the Global Learning & Observations to Benefit the Environment (GLOBE) Mosquito Habitat Mapper.
      The paper – Integrating global citizen science platforms to enable next-generation surveillance of invasive and vector mosquitoes – was published as part of a special issue of Insects on Citizen Science Approaches to Vector Surveillance. It is in the top 5% of all research outputs scored by Altmetric, which is a high-level measure of the quality and quantity of online attention that it has received. The scoring algorithm takes various factors into account, such as the relative reach of the different sources of attention. The paper has been cited 23 times.
      Papers were selected by the journal’s Award Committee according to the following criteria:
      – Scientific merit and broad impact;
      – Originality of the research objectives and/or the ideas presented;
      – Creativity of the study design or uniqueness of the approaches and concepts;
      – Clarity of presentation;
      – Citations and downloads.
      Each winner of the best paper award will receive CHF 500 and a chance to publish a paper free of charge in Insects in 2024 after peer review.
      The paper is a result of a collaboration by IGES with University of South Florida, Woodrow Wilson International Center for Scholars, Universitat Pompeu Fabra, and iNaturalist.
      Following is the full citation: Ryan M. Carney, Connor Mapes, Russanne D. Low, Alex Long, Anne Bowser, David Durieux, Karlene Rivera, Berj Dekramanjian, Frederic Bartumeus, Daniel Guerrero, Carrie E. Seltzer, Farhat Azam, Sriram Chellappan, John R. B. Palmer.Role of Insects in Human Society Citizen Science Approaches to Vector Surveillance. Insects 2022, 13(8), 675; https://doi.org/10.3390/insects13080675 – 27 Jul 2022
      NESEC is supported by NASA under cooperative agreement award number NNX16AE28A and is part of NASA’s Science Activation Portfolio. Learn more about how Science Activation connects NASA science experts, real content, and experiences with community leaders to do science in ways that activate minds and promote deeper understanding of our world and beyond: https://science.nasa.gov/learn
      Screenshot of the Global Mosquito Observations interactive dashboard that combines various types of observations from data streams into an interoperable visualization. Each color-coded dot represents a citizen scientist’s observation and can be clicked to access the associated photos and data. Share








      Details
      Last Updated Sep 03, 2024 Editor NASA Science Editorial Team Related Terms
      Earth Science Science Activation Explore More
      2 min read Co-creating authentic STEM learning experiences with Latino communities


      Article


      4 days ago
      6 min read NASA Discovers a Long-Sought Global Electric Field on Earth
      An international team of scientists has successfully measured a planet-wide electric field thought to be…


      Article


      6 days ago
      3 min read Eclipse Soundscapes AudioMoth Donations Will Study Nature at Night


      Article


      6 days ago
      Keep Exploring Discover More Topics From NASA
      James Webb Space Telescope


      Webb is the premier observatory of the next decade, serving thousands of astronomers worldwide. It studies every phase in the…


      Perseverance Rover


      This rover and its aerial sidekick were assigned to study the geology of Mars and seek signs of ancient microbial…


      Parker Solar Probe


      On a mission to “touch the Sun,” NASA’s Parker Solar Probe became the first spacecraft to fly through the corona…


      Juno


      NASA’s Juno spacecraft entered orbit around Jupiter in 2016, the first explorer to peer below the planet’s dense clouds to…

      View the full article
    • By NASA
      Researchers used an interferometer that can precisely measure gravity, magnetic fields, and other forces to study the influence of International Space Station vibrations. Results revealed that matter-wave interference of rubidium gases is robust and repeatable over a period spanning months. Atom interferometry experiments could help create high-precision measurement capabilities for gravitational, Earth, and planetary sciences.

      Using ultracold rubidium atoms, Cold Atom Lab researchers examined a three-pulse Mach–Zehnder interferometer, a device that determines phase shift variations between two parallel beams, to understand the influence of space station vibrations. Researchers note that atom sensitivities and visibility degrade due to the vibration environment of the International Space Station. The Cold Atom Lab’s interferometer uses light pulses to create a readout of accelerations, rotations, gravity, and subtle forces that could signify new physics acting on matter. Cold Atom Lab experiments serve as pathfinders for proposed space missions relying on the sustained measurement of wave-matter interference, including gravitational wave detection, dark matter detection, seismology mapping, and advanced satellite navigation. 

      Read more here.

      Researchers developed a novel method to categorize and assess the fitness of each gene in one species of bacteria, N. aromaticavorans. Results published in BMC Genomics state that core metabolic processes and growth-promoting genes have high fitness during spaceflight, likely as an adaptive response to stress in microgravity. Future comprehensive studies of the entire genome of other species could help guide the development of strategies to enhance or diminish microorganism resilience in space missions.

      The Bacterial Genome Fitness investigation grows multiple types of bacteria in space to learn more about important processes for their growth. Previous studies of microorganism communities have shown that spaceflight can induce resistance to antibiotics, lead to changes in biofilm formation, and boost cell growth in various species. N. aromaticivorans can degrade certain compounds, potentially providing benefits in composting and biofuel production during deep space missions.

      Read more here.

      Researchers burned large, isolated droplets of the hydrocarbon n-dodecane, a component of kerosene and some jet fuels, in microgravity and found that hot flames were followed by a prolonged period of cool flames at lower pressures. Results showed that hot flames were more likely to unpredictably reignite at higher pressures. Studying the burn behavior of hydrocarbons assists researchers in the development of more efficient engines and fuels that reduce fire hazards to ensure crew safety in future long-distance missions.

      The Cool Flames investigation studies the low-temperature combustion of various isolated fuel droplets. Cool flames happen in microgravity when certain fuel types burn very hot and then quickly drop to a much lower temperature with no visible flames. This investigation studies several fuels such as pure hydrocarbons, biofuels, and mixtures of pure hydrocarbons to enhance understanding of low-temperature chemistry. Improved knowledge of low-temperature burning could benefit next-generation fuels and engines.

      Read more here.
      NASA astronaut Shane Kimbrough completing the Multi-user Droplet Combustion Apparatus reconfiguration to the Cool Flames Investigation setup.NASAView the full article
  • Check out these Videos

×
×
  • Create New...