Members Can Post Anonymously On This Site
Mark SubbaRao Brings Data to Life Through Art
-
Similar Topics
-
By NASA
6 min read
Smarter Searching: NASA AI Makes Science Data Easier to Find
Image snapshot taken from NASA Worldview of NASA’s Global Precipitation Measurement (GPM) mission on March 15, 2025 showing heavy rain across the southeastern U.S. with an overlay of the GCMD Keyword Recommender for Earth Science, Atmosphere, Precipitation, Droplet Size. NASA Worldview Imagine shopping for a new pair of running shoes online. If each seller described them differently—one calling them “sneakers,” another “trainers,” and someone else “footwear for exercise”—you’d quickly feel lost in a sea of mismatched terminology. Fortunately, most online stores use standardized categories and filters, so you can click through a simple path: Women’s > Shoes > Running Shoes—and quickly find what you need.
Now, scale that problem to scientific research. Instead of sneakers, think “aerosol optical depth” or “sea surface temperature.” Instead of a handful of retailers, it is thousands of researchers, instruments, and data providers. Without a common language for describing data, finding relevant Earth science datasets would be like trying to locate a needle in a haystack, blindfolded.
That’s why NASA created the Global Change Master Directory (GCMD), a standardized vocabulary that helps scientists tag their datasets in a consistent and searchable way. But as science evolves, so does the challenge of keeping metadata organized and discoverable.
To meet that challenge, NASA’s Office of Data Science and Informatics (ODSI) at the agency’s Marshall Space Flight Center (MSFC) in Huntsville, Alabama, developed the GCMD Keyword Recommender (GKR): a smart tool designed to help data providers and curators assign the right keywords, automatically.
Smarter Tagging, Accelerated Discovery
The upgraded GKR model isn’t just a technical improvement; it’s a leap forward in how we organize and access scientific knowledge. By automatically recommending precise, standardized keywords, the model reduces the burden on human curators while ensuring metadata quality remains high. This makes it easier for researchers, students, and the public to find exactly the datasets they need.
It also sets the stage for broader applications. The techniques used in GKR, like applying focal loss to rare-label classification problems and adapting pre-trained transformers to specialized domains, can benefit fields well beyond Earth science.
Metadata Matchmaker
The newly upgraded GKR model tackles a massive challenge in information science known as extreme multi-label classification. That’s a mouthful, but the concept is straightforward: Instead of predicting just one label, the model must choose many, sometimes dozens, from a set of thousands. Each dataset may need to be tagged with multiple, nuanced descriptors pulled from a controlled vocabulary.
Think of it like trying to identify all the animals in a photograph. If there’s just a dog, it’s easy. But if there’s a dog, a bird, a raccoon hiding behind a bush, and a unicorn that only shows up in 0.1% of your training photos, the task becomes far more difficult. That’s what GKR is up against: tagging complex datasets with precision, even when examples of some keywords are scarce.
And the problem is only growing. The new version of GKR now considers more than 3,200 keywords, up from about 430 in its earlier iteration. That’s a sevenfold increase in vocabulary complexity, and a major leap in what the model needs to learn and predict.
To handle this scale, the GKR team didn’t just add more data; they built a more capable model from the ground up. At the heart of the upgrade is INDUS, an advanced language model trained on a staggering 66 billion words drawn from scientific literature across disciplines—Earth science, biological sciences, astronomy, and more.
NASA ODSI’s GCMD Keyword Recommender AI model automatically tags scientific datasets with the help of INDUS, a large language model trained on NASA scientific publications across the disciplines of astrophysics, biological and physical sciences, Earth science, heliophysics, and planetary science. NASA “We’re at the frontier of cutting-edge artificial intelligence and machine learning for science,” said Sajil Awale, a member of the NASA ODSI AI team at MSFC. “This problem domain is interesting, and challenging, because it’s an extreme classification problem where the model needs to differentiate even very similar keywords/tags based on small variations of context. It’s exciting to see how we have leveraged INDUS to build this GKR model because it is designed and trained for scientific domains. There are opportunities to improve INDUS for future uses.”
This means that the new GKR isn’t just guessing based on word similarities; it understands the context in which keywords appear. It’s the difference between a model knowing that “precipitation” might relate to weather versus recognizing when it means a climate variable in satellite data.
And while the older model was trained on only 2,000 metadata records, the new version had access to a much richer dataset of more than 43,000 records from NASA’s Common Metadata Repository. That increased exposure helps the model make more accurate predictions.
The Common Metadata Repository is the backend behind the following data search and discovery services:
Earthdata Search International Data Network Learning to Love Rare Words
One of the biggest hurdles in a task like this is class imbalance. Some keywords appear frequently; others might show up just a handful of times. Traditional machine learning approaches, like cross-entropy loss, which was used initially to train the model, tend to favor the easy, common labels, and neglect the rare ones.
To solve this, NASA’s team turned to focal loss, a strategy that reduces the model’s attention to obvious examples and shifts focus toward the harder, underrepresented cases.
The result? A model that performs better across the board, especially on the keywords that matter most to specialists searching for niche datasets.
From Metadata to Mission
Ultimately, science depends not only on collecting data, but on making that data usable and discoverable. The updated GKR tool is a quiet but critical part of that mission. By bringing powerful AI to the task of metadata tagging, it helps ensure that the flood of Earth observation data pouring in from satellites and instruments around the globe doesn’t get lost in translation.
In a world awash with data, tools like GKR help researchers find the signal in the noise and turn information into insight.
Beyond powering GKR, the INDUS large language model is also enabling innovation across other NASA SMD projects. For example, INDUS supports the Science Discovery Engine by helping automate metadata curation and improving the relevancy ranking of search results.The diverse applications reflect INDUS’s growing role as a foundational AI capability for SMD.
The INDUS large language model is funded by the Office of the Chief Science Data Officer within NASA’s Science Mission Directorate at NASA Headquarters in Washington. The Office of the Chief Science Data Officer advances scientific discovery through innovative applications and partnerships in data science, advanced analytics, and artificial intelligence.
Share
Details
Last Updated Jul 09, 2025 Related Terms
Science & Research Artificial Intelligence (AI) Explore More
2 min read Polar Tourists Give Positive Reviews to NASA Citizen Science in Antarctica
Article
6 hours ago
2 min read Hubble Observations Give “Missing” Globular Cluster Time to Shine
Article
6 days ago
5 min read How NASA’s SPHEREx Mission Will Share Its All-Sky Map With the World
Article
7 days ago
Keep Exploring Discover Related Topics
Missions
Humans in Space
Climate Change
Solar System
View the full article
-
By Space Force
The new facility is enabling Guardians and mission partners to seamlessly monitor space-based sensors and make rapid, data-driven decisions that enhance missile warning and threat responses for the joint force.
View the full article
-
By Space Force
Ahead of the movie's theatrical release, Disney/Pixar invited military families to special screenings across the country, including at an event hosted by the Motion Picture Association in Washington, D.C.
View the full article
-
By NASA
3 min read
Preparations for Next Moonwalk Simulations Underway (and Underwater)
The Jet Propulsion Laboratory perfected aerogel for the Stardust mission. Under Stardust, bricks of aerogel covered panels on a spacecraft that flew behind a comet, with the microporous material “soft catching” any particles that might strike it and preserving them for return to Earth.NASA Consisting of 99% air, aerogel is the world’s lightest solid. This unique material has found purpose in several forms — from NASA missions to high fashion.
Driven by the desire to create a 3D cloud, Greek artist, Ioannis Michaloudis, learned to use aerogel as an artistic medium. His journey spanning more than 25 years took him to the Massachusetts Institute of Technology (MIT) in Cambridge; Shivaji University in Maharashtra, India, and NASA’s Jet Propulsion Laboratory in Southern California.
A researcher at MIT introduced Michaloudis to aerogel after hearing of his cloud-making ambition, and he was immediately intrigued. Aerogel is made by combining a polymer with a solvent to create a gel and flash-drying it under pressure, leaving a solid filled with microscopic pores.
Scientists at JPL chose aerogel in the mid-1990s to enable the Stardust mission, with the idea that a porous surface could capture particles while flying on a probe behind a comet. Aerogel worked in lab tests, but it was difficult to manufacture consistently and needed to be made space-worthy. NASA JPL hired materials scientist Steve Jones to develop a flight-ready aerogel, and he eventually got funding for an aerogel lab.
The aerogel AirSwipe bag Michaloudis created for Coperni’s 2024 fall collection debut appears almost luminous in its model’s hand. The bag immediately captured the world’s attention.Coperni
The Stardust mission succeeded, and when Michaloudis heard of it, he reached out to JPL, where Jones invited him to the lab. Now retired, Jones recalled, “I went through the primer on aerogel with him, the different kinds you could make and their different properties.” The size of Jones’ reactor, enabling it to make large objects, impressed Michaloudis. With tips on how to safely operate a large reactor, he outfitted his own lab with one.
In India, Michaloudis learned recipes for aerogels that can be molded into large objects and don’t crack or shrink during drying. His continued work with aerogels has created an extensive art portfolio.
Michaloudis has had more than a dozen solo exhibitions. All his artwork involves aerogel, drawing attention with its unusual qualities. An ethereal, translucent blue, it casts an orange shadow and can withstand molten metals.
In 2020, Michaloudis created a quartz-encapsulated aerogel pendant for the centerpiece of that year’s collection from French jewelry house Boucheron. Michaloudis also captured the fashion and design world’s attention with a handbag made of aerogel, unveiled at Coperni’s 2024 fall collection debut.
NASA was a crucial step along the way. “I am what I am, and we made what we made thanks to the Stardust project,” said Michaloudis.
Read More Share
Details
Last Updated Jun 09, 2025 Related Terms
Technology Transfer & Spinoffs Spinoffs Technology Transfer Explore More
2 min read NASA Tech Gives Treadmill Users a ‘Boost’
Creators of the original antigravity treadmill continue to advance technology with new company.
Article 2 weeks ago 3 min read Winners Announced in NASA’s 2025 Gateways to Blue Skies Competition
Article 3 weeks ago 3 min read Meet Four NASA Inventors Improving Life on Earth and Beyond
Article 1 month ago Keep Exploring Discover Related Topics
Missions
Technology Transfer & Spinoffs
Stardust
NASA’s Stardust was the first spacecraft to bring samples from a comet to Earth, and the first NASA mission to…
Solar System
View the full article
-
By NASA
4 min read
Preparations for Next Moonwalk Simulations Underway (and Underwater)
A lot can change in a year for Earth’s forests and vegetation, as springtime and rainy seasons can bring new growth, while cooling temperatures and dry weather can bring a dieback of those green colors. And now, a novel type of NASA visualization illustrates those changes in a full complement of colors as seen from space.
Researchers have now gathered a complete year of PACE data to tell a story about the health of land vegetation by detecting slight variations in leaf colors. Previous missions allowed scientists to observe broad changes in chlorophyll, the pigment that gives plants their green color and also allows them to perform photosynthesis. But PACE now allows scientists to see three different pigments in vegetation: chlorophyll, anthocyanins, and carotenoids. The combination of these three pigments helps scientists pinpoint even more information about plant health. Credit: NASA’s Goddard Space Flight Center NASA’s Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) satellite is designed to view Earth’s microscopic ocean plants in a new lens, but researchers have proved its hyperspectral use over land, as well.
Previous missions measured broad changes in chlorophyll, the pigment that gives plants their green color and also allows them to perform photosynthesis. Now, for the first time, PACE measurements have allowed NASA scientists and visualizers to show a complete year of global vegetation data using three pigments: chlorophyll, anthocyanins, and carotenoids. That multicolor imagery tells a clearer story about the health of land vegetation by detecting the smallest of variations in leaf colors.
“Earth is amazing. It’s humbling, being able to see life pulsing in colors across the whole globe,” said Morgaine McKibben, PACE applications lead at NASA’s Goddard Space Flight Center in Greenbelt, Maryland. “It’s like the overview effect that astronauts describe when they look down at Earth, except we are looking through our technology and data.”
Anthocyanins, carotenoids, and chlorophyll data light up North America, highlighting vegetation and its health.Credit: NASA’s Scientific Visualization Studio Anthocyanins are the red pigments in leaves, while carotenoids are the yellow pigments – both of which we see when autumn changes the colors of trees. Plants use these pigments to protect themselves from fluctuations in the weather, adapting to the environment through chemical changes in their leaves. For example, leaves can turn more yellow when they have too much sunlight but not enough of the other necessities, like water and nutrients. If they didn’t adjust their color, it would damage the mechanisms they have to perform photosynthesis.
In the visualization, the data is highlighted in bright colors: magenta represents anthocyanins, green represents chlorophyll, and cyan represents carotenoids. The brighter the colors are, the more leaves there are in that area. The movement of these colors across the land areas show the seasonal changes over time.
In areas like the evergreen forests of the Pacific Northwest, plants undergo less seasonal change. The data highlights this, showing comparatively steadier colors as the year progresses.
The combination of these three pigments helps scientists pinpoint even more information about plant health.
“Shifts in these pigments, as detected by PACE, give novel information that may better describe vegetation growth, or when vegetation changes from flourishing to stressed,” said McKibben. “It’s just one of many ways the mission will drive increased understanding of our home planet and enable innovative, practical solutions that serve society.”
The Ocean Color Instrument on PACE collects hyperspectral data, which means it observes the planet in 100 different wavelengths of visible and near infrared light. It is the only instrument – in space or elsewhere – that provides hyperspectral coverage around the globe every one to two days. The PACE mission builds on the legacy of earlier missions, such as Landsat, which gathers higher resolution data but observes a fraction of those wavelengths.
In a paper recently published in Remote Sensing Letters, scientists introduced the mission’s first terrestrial data products.
“This PACE data provides a new view of Earth that will improve our understanding of ecosystem dynamics and function,” said Fred Huemmrich, research professor at the University of Maryland, Baltimore County, member of the PACE science and applications team, and first author of the paper. “With the PACE data, it’s like we’re looking at a whole new world of color. It allows us to describe pigment characteristics at the leaf level that we weren’t able to do before.”
As scientists continue to work with these new data, available on the PACE website, they’ll be able to incorporate it into future science applications, which may include forest monitoring or early detection of drought effects.
By Erica McNamee
NASA’s Goddard Space Flight Center, Greenbelt, Md.
Share
Details
Last Updated Jun 05, 2025 EditorKate D. RamsayerContactKate D. Ramsayerkate.d.ramsayer@nasa.gov Related Terms
Earth Goddard Space Flight Center PACE (Plankton, Aerosol, Cloud, Ocean Ecosystem) Explore More
4 min read Tundra Vegetation to Grow Taller, Greener Through 2100, NASA Study Finds
Article 10 months ago 8 min read NASA Researchers Study Coastal Wetlands, Champions of Carbon Capture
In the Florida Everglades, NASA’s BlueFlux Campaign investigates the relationship between tropical wetlands and greenhouse…
Article 3 months ago 5 min read NASA Takes to the Air to Study Wildflowers
Article 2 months ago View the full article
-
-
Check out these Videos
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.