Members Can Post Anonymously On This Site
Ames Science Directorate’s Stars of the Month: July 2025
-
Similar Topics
-
By European Space Agency
Week in images: 07-11 July 2025
Discover our week through the lens
View the full article
-
By NASA
6 min read
Smarter Searching: NASA AI Makes Science Data Easier to Find
Image snapshot taken from NASA Worldview of NASA’s Global Precipitation Measurement (GPM) mission on March 15, 2025 showing heavy rain across the southeastern U.S. with an overlay of the GCMD Keyword Recommender for Earth Science, Atmosphere, Precipitation, Droplet Size. NASA Worldview Imagine shopping for a new pair of running shoes online. If each seller described them differently—one calling them “sneakers,” another “trainers,” and someone else “footwear for exercise”—you’d quickly feel lost in a sea of mismatched terminology. Fortunately, most online stores use standardized categories and filters, so you can click through a simple path: Women’s > Shoes > Running Shoes—and quickly find what you need.
Now, scale that problem to scientific research. Instead of sneakers, think “aerosol optical depth” or “sea surface temperature.” Instead of a handful of retailers, it is thousands of researchers, instruments, and data providers. Without a common language for describing data, finding relevant Earth science datasets would be like trying to locate a needle in a haystack, blindfolded.
That’s why NASA created the Global Change Master Directory (GCMD), a standardized vocabulary that helps scientists tag their datasets in a consistent and searchable way. But as science evolves, so does the challenge of keeping metadata organized and discoverable.
To meet that challenge, NASA’s Office of Data Science and Informatics (ODSI) at the agency’s Marshall Space Flight Center (MSFC) in Huntsville, Alabama, developed the GCMD Keyword Recommender (GKR): a smart tool designed to help data providers and curators assign the right keywords, automatically.
Smarter Tagging, Accelerated Discovery
The upgraded GKR model isn’t just a technical improvement; it’s a leap forward in how we organize and access scientific knowledge. By automatically recommending precise, standardized keywords, the model reduces the burden on human curators while ensuring metadata quality remains high. This makes it easier for researchers, students, and the public to find exactly the datasets they need.
It also sets the stage for broader applications. The techniques used in GKR, like applying focal loss to rare-label classification problems and adapting pre-trained transformers to specialized domains, can benefit fields well beyond Earth science.
Metadata Matchmaker
The newly upgraded GKR model tackles a massive challenge in information science known as extreme multi-label classification. That’s a mouthful, but the concept is straightforward: Instead of predicting just one label, the model must choose many, sometimes dozens, from a set of thousands. Each dataset may need to be tagged with multiple, nuanced descriptors pulled from a controlled vocabulary.
Think of it like trying to identify all the animals in a photograph. If there’s just a dog, it’s easy. But if there’s a dog, a bird, a raccoon hiding behind a bush, and a unicorn that only shows up in 0.1% of your training photos, the task becomes far more difficult. That’s what GKR is up against: tagging complex datasets with precision, even when examples of some keywords are scarce.
And the problem is only growing. The new version of GKR now considers more than 3,200 keywords, up from about 430 in its earlier iteration. That’s a sevenfold increase in vocabulary complexity, and a major leap in what the model needs to learn and predict.
To handle this scale, the GKR team didn’t just add more data; they built a more capable model from the ground up. At the heart of the upgrade is INDUS, an advanced language model trained on a staggering 66 billion words drawn from scientific literature across disciplines—Earth science, biological sciences, astronomy, and more.
NASA ODSI’s GCMD Keyword Recommender AI model automatically tags scientific datasets with the help of INDUS, a large language model trained on NASA scientific publications across the disciplines of astrophysics, biological and physical sciences, Earth science, heliophysics, and planetary science. NASA “We’re at the frontier of cutting-edge artificial intelligence and machine learning for science,” said Sajil Awale, a member of the NASA ODSI AI team at MSFC. “This problem domain is interesting, and challenging, because it’s an extreme classification problem where the model needs to differentiate even very similar keywords/tags based on small variations of context. It’s exciting to see how we have leveraged INDUS to build this GKR model because it is designed and trained for scientific domains. There are opportunities to improve INDUS for future uses.”
This means that the new GKR isn’t just guessing based on word similarities; it understands the context in which keywords appear. It’s the difference between a model knowing that “precipitation” might relate to weather versus recognizing when it means a climate variable in satellite data.
And while the older model was trained on only 2,000 metadata records, the new version had access to a much richer dataset of more than 43,000 records from NASA’s Common Metadata Repository. That increased exposure helps the model make more accurate predictions.
The Common Metadata Repository is the backend behind the following data search and discovery services:
Earthdata Search International Data Network Learning to Love Rare Words
One of the biggest hurdles in a task like this is class imbalance. Some keywords appear frequently; others might show up just a handful of times. Traditional machine learning approaches, like cross-entropy loss, which was used initially to train the model, tend to favor the easy, common labels, and neglect the rare ones.
To solve this, NASA’s team turned to focal loss, a strategy that reduces the model’s attention to obvious examples and shifts focus toward the harder, underrepresented cases.
The result? A model that performs better across the board, especially on the keywords that matter most to specialists searching for niche datasets.
From Metadata to Mission
Ultimately, science depends not only on collecting data, but on making that data usable and discoverable. The updated GKR tool is a quiet but critical part of that mission. By bringing powerful AI to the task of metadata tagging, it helps ensure that the flood of Earth observation data pouring in from satellites and instruments around the globe doesn’t get lost in translation.
In a world awash with data, tools like GKR help researchers find the signal in the noise and turn information into insight.
Beyond powering GKR, the INDUS large language model is also enabling innovation across other NASA SMD projects. For example, INDUS supports the Science Discovery Engine by helping automate metadata curation and improving the relevancy ranking of search results.The diverse applications reflect INDUS’s growing role as a foundational AI capability for SMD.
The INDUS large language model is funded by the Office of the Chief Science Data Officer within NASA’s Science Mission Directorate at NASA Headquarters in Washington. The Office of the Chief Science Data Officer advances scientific discovery through innovative applications and partnerships in data science, advanced analytics, and artificial intelligence.
Share
Details
Last Updated Jul 09, 2025 Related Terms
Science & Research Artificial Intelligence (AI) Explore More
2 min read Polar Tourists Give Positive Reviews to NASA Citizen Science in Antarctica
Article
6 hours ago
2 min read Hubble Observations Give “Missing” Globular Cluster Time to Shine
Article
6 days ago
5 min read How NASA’s SPHEREx Mission Will Share Its All-Sky Map With the World
Article
7 days ago
Keep Exploring Discover Related Topics
Missions
Humans in Space
Climate Change
Solar System
View the full article
-
By NASA
Explore This Section Science Uncategorized Helio Highlights: May… Home Framework for Heliophysics Education About Helio Big Idea 1.1 Helio Big Idea 1.2 Helio Big Idea 1.3 Helio Big Idea 2.1 Helio Big Idea 2.2 Helio Big Idea 2.3 Helio Big Idea 3.1 Helio Big Idea 3.2 Helio Big Idea 3.3 Helio Missions Helio Topics Resource Database About NASA HEAT More Highlights Space Math 3 min read
Helio Highlights: May 2025
3 Min Read Helio Highlights: May 2025
A satellite image showing the extent of the Northern Lights during part of the Mother’s Day 2024 solar storms. Credits:
NOAA One year ago, solar storms lit up the night sky. Why?
The Sun is 93 million miles away from Earth, on average. Even though it’s far away, we can still see and feel its effects here. One of the most beautiful effects are the auroras – colorful lights that dance across the sky near the North and South Poles. These are also called the Northern and Southern Lights. They happen when tiny particles from the Sun hit gas molecules in our atmosphere and give off energy.
Sometimes the Sun becomes very active and sends out a lot more energy than normal. When this happens, we can see auroras in places much farther from the poles than normal. In May 2024, around Mother’s Day, the Sun sent powerful solar storms in the direction of Earth. These storms were also called the Gannon Storms, named after Jennifer Gannon, a scientist who studied space weather. The Northern Lights could be seen as far south as Puerto Rico, Hawaii, Mexico, Jamaica, and the Bahamas. The Southern Lights were also visible as far north as South Africa and New Zealand.
Aurora Borealis seen from British Columbia, Canada on May 10, 2024. NASA/Mara Johnson-Groh Scientists who study the Sun and its effects on our solar system work in a field called heliophysics. Their studies of the Sun have shown that it goes through cycles of being more active and less active. Each one of these cycles lasts about 11 years, but can be anywhere from 8 to 14 years long. This is called the Solar Cycle.
The middle of each cycle is called Solar Maximum. During this time, the Sun has more dark spots (called sunspots) and creates more space weather events. The big storms in May 2024 happened during the Solar Maximum for Solar Cycle 25.
On May 8 and 9, 2024, an active area on the Sun called AR3664 shot out powerful solar flares and several huge bursts of energy called coronal mass ejections (CMEs). These CMEs headed straight for Earth. The first CME pushed aside the normal solar wind, making a clear path for the others to reach us faster. When all this energy hit our atmosphere, it created auroras much farther from the poles than usual. It was like the Sun gave the auroras a huge power boost!
Eruptions of Solar material into space as seen on May 7 (right) and May 8 (left), 2024. These types of eruptions often come just before a larger Coronal Mass Ejection (CME), including the ones which caused the Mother’s Day solar storms. NASA/SDO Auroras are beautiful to watch, but the space weather that creates them can also cause problems. Space weather can mess up radio signals, power grids, GPS systems, and satellites. During the May 2024 storms, GPS systems used by farmers were disrupted. Many farmers use GPS to guide their self-driving tractors. Since this happened during peak planting season, it may have cost billions of dollars in lost profit.
Because space weather can cause so many problems, scientists at NASA and around the world watch the Sun closely to predict when these events will happen. You can help too! Join local science projects at schools, teach others about the Sun, and help make observations in your area. All of this helps us to learn more about the Sun and how it affects our planet.
Here are some resources to connect you to the Sun and auroras
Lesson Plans & Educator Guides
Magnetic Mysteries: Sun-Earth Interactions
A 5E lesson for high school students to investigate the question of what causes aurora by using Helioviewer to examine solar activity.
Aurora Research and Heliophysics
Learn about aurora, how they form, and the different phases they go through, as well as heliophysics missions that study them.
How Earth’s Magnetic Field Causes Auroras
A 5E middle school lesson where students explore why our planet has a magnetic field (and other planets don’t) and what it is like.
Interactive Resources
Magnetic Earth
Introductory activity where users learn about the magnetic field that surrounds Earth and its role in creating the Northern Lights.
NOAA Aurora
30-Minute Forecast
An interactive aurora map for both hemispheres which allows users to predict the likelihood of auroras at different latitudes.
Webinars and Slide Decks
Space Weather
Basics
A slide deck (41 slides) that offers an elementary introduction to the basic features of space weather and its interactions with Earth’s magnetosphere and various technologies.
View the full article
-
Check out these Videos
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.