Members Can Post Anonymously On This Site
Astronomers Find One of the Youngest and Brightest Galaxies in the Early Universe
-
Similar Topics
-
By NASA
6 min read
Smarter Searching: NASA AI Makes Science Data Easier to Find
Image snapshot taken from NASA Worldview of NASA’s Global Precipitation Measurement (GPM) mission on March 15, 2025 showing heavy rain across the southeastern U.S. with an overlay of the GCMD Keyword Recommender for Earth Science, Atmosphere, Precipitation, Droplet Size. NASA Worldview Imagine shopping for a new pair of running shoes online. If each seller described them differently—one calling them “sneakers,” another “trainers,” and someone else “footwear for exercise”—you’d quickly feel lost in a sea of mismatched terminology. Fortunately, most online stores use standardized categories and filters, so you can click through a simple path: Women’s > Shoes > Running Shoes—and quickly find what you need.
Now, scale that problem to scientific research. Instead of sneakers, think “aerosol optical depth” or “sea surface temperature.” Instead of a handful of retailers, it is thousands of researchers, instruments, and data providers. Without a common language for describing data, finding relevant Earth science datasets would be like trying to locate a needle in a haystack, blindfolded.
That’s why NASA created the Global Change Master Directory (GCMD), a standardized vocabulary that helps scientists tag their datasets in a consistent and searchable way. But as science evolves, so does the challenge of keeping metadata organized and discoverable.
To meet that challenge, NASA’s Office of Data Science and Informatics (ODSI) at the agency’s Marshall Space Flight Center (MSFC) in Huntsville, Alabama, developed the GCMD Keyword Recommender (GKR): a smart tool designed to help data providers and curators assign the right keywords, automatically.
Smarter Tagging, Accelerated Discovery
The upgraded GKR model isn’t just a technical improvement; it’s a leap forward in how we organize and access scientific knowledge. By automatically recommending precise, standardized keywords, the model reduces the burden on human curators while ensuring metadata quality remains high. This makes it easier for researchers, students, and the public to find exactly the datasets they need.
It also sets the stage for broader applications. The techniques used in GKR, like applying focal loss to rare-label classification problems and adapting pre-trained transformers to specialized domains, can benefit fields well beyond Earth science.
Metadata Matchmaker
The newly upgraded GKR model tackles a massive challenge in information science known as extreme multi-label classification. That’s a mouthful, but the concept is straightforward: Instead of predicting just one label, the model must choose many, sometimes dozens, from a set of thousands. Each dataset may need to be tagged with multiple, nuanced descriptors pulled from a controlled vocabulary.
Think of it like trying to identify all the animals in a photograph. If there’s just a dog, it’s easy. But if there’s a dog, a bird, a raccoon hiding behind a bush, and a unicorn that only shows up in 0.1% of your training photos, the task becomes far more difficult. That’s what GKR is up against: tagging complex datasets with precision, even when examples of some keywords are scarce.
And the problem is only growing. The new version of GKR now considers more than 3,200 keywords, up from about 430 in its earlier iteration. That’s a sevenfold increase in vocabulary complexity, and a major leap in what the model needs to learn and predict.
To handle this scale, the GKR team didn’t just add more data; they built a more capable model from the ground up. At the heart of the upgrade is INDUS, an advanced language model trained on a staggering 66 billion words drawn from scientific literature across disciplines—Earth science, biological sciences, astronomy, and more.
NASA ODSI’s GCMD Keyword Recommender AI model automatically tags scientific datasets with the help of INDUS, a large language model trained on NASA scientific publications across the disciplines of astrophysics, biological and physical sciences, Earth science, heliophysics, and planetary science. NASA “We’re at the frontier of cutting-edge artificial intelligence and machine learning for science,” said Sajil Awale, a member of the NASA ODSI AI team at MSFC. “This problem domain is interesting, and challenging, because it’s an extreme classification problem where the model needs to differentiate even very similar keywords/tags based on small variations of context. It’s exciting to see how we have leveraged INDUS to build this GKR model because it is designed and trained for scientific domains. There are opportunities to improve INDUS for future uses.”
This means that the new GKR isn’t just guessing based on word similarities; it understands the context in which keywords appear. It’s the difference between a model knowing that “precipitation” might relate to weather versus recognizing when it means a climate variable in satellite data.
And while the older model was trained on only 2,000 metadata records, the new version had access to a much richer dataset of more than 43,000 records from NASA’s Common Metadata Repository. That increased exposure helps the model make more accurate predictions.
The Common Metadata Repository is the backend behind the following data search and discovery services:
Earthdata Search International Data Network Learning to Love Rare Words
One of the biggest hurdles in a task like this is class imbalance. Some keywords appear frequently; others might show up just a handful of times. Traditional machine learning approaches, like cross-entropy loss, which was used initially to train the model, tend to favor the easy, common labels, and neglect the rare ones.
To solve this, NASA’s team turned to focal loss, a strategy that reduces the model’s attention to obvious examples and shifts focus toward the harder, underrepresented cases.
The result? A model that performs better across the board, especially on the keywords that matter most to specialists searching for niche datasets.
From Metadata to Mission
Ultimately, science depends not only on collecting data, but on making that data usable and discoverable. The updated GKR tool is a quiet but critical part of that mission. By bringing powerful AI to the task of metadata tagging, it helps ensure that the flood of Earth observation data pouring in from satellites and instruments around the globe doesn’t get lost in translation.
In a world awash with data, tools like GKR help researchers find the signal in the noise and turn information into insight.
Beyond powering GKR, the INDUS large language model is also enabling innovation across other NASA SMD projects. For example, INDUS supports the Science Discovery Engine by helping automate metadata curation and improving the relevancy ranking of search results.The diverse applications reflect INDUS’s growing role as a foundational AI capability for SMD.
The INDUS large language model is funded by the Office of the Chief Science Data Officer within NASA’s Science Mission Directorate at NASA Headquarters in Washington. The Office of the Chief Science Data Officer advances scientific discovery through innovative applications and partnerships in data science, advanced analytics, and artificial intelligence.
Share
Details
Last Updated Jul 09, 2025 Related Terms
Science & Research Artificial Intelligence (AI) Explore More
2 min read Polar Tourists Give Positive Reviews to NASA Citizen Science in Antarctica
Article
6 hours ago
2 min read Hubble Observations Give “Missing” Globular Cluster Time to Shine
Article
6 days ago
5 min read How NASA’s SPHEREx Mission Will Share Its All-Sky Map With the World
Article
7 days ago
Keep Exploring Discover Related Topics
Missions
Humans in Space
Climate Change
Solar System
View the full article
-
By European Space Agency
Astronomers have discovered a huge filament of hot gas bridging four galaxy clusters. At 10 times as massive as our galaxy, the thread could contain some of the Universe’s ‘missing’ matter, addressing a decades-long mystery.
View the full article
-
By European Space Agency
The European Space Agency has begun the 55th International Paris Air Show by unveiling the first images from the Proba-3 spacecraft.
View the full article
-
By NASA
Curiosity Navigation Curiosity Home Mission Overview Where is Curiosity? Mission Updates Science Overview Instruments Highlights Exploration Goals News and Features Multimedia Curiosity Raw Images Images Videos Audio Mosaics More Resources Mars Missions Mars Sample Return Mars Perseverance Rover Mars Curiosity Rover MAVEN Mars Reconnaissance Orbiter Mars Odyssey More Mars Missions Mars Home 2 min read
Sols 4554–4555: Let’s Try That One Again…
NASA’s Mars rover Curiosity acquired this image using its Front Hazard Avoidance Camera (Front Hazcam) on May 28, 2025 — Sol 4553, or Martian day 4,553 of the Mars Science Laboratory mission — at 04:48:55 UTC. NASA/JPL-Caltech Written by Abigail Fraeman, Planetary Geologist at NASA’s Jet Propulsion Laboratory
Earth planning date: Wednesday, May 28, 2025
We came in early this morning and learned that part of Tuesday’s plan didn’t execute on Mars due to a temporary issue with the arm. We collected APXS data on the target “Palo Verde Mountains,” but were not able to take the corresponding MAHLI images or drive away. So it was a straightforward decision for the planning team today to pick up where we left off yesterday, giving ourselves a second chance to collect the MAHLI observation and then complete the same 29.5-meter drive to the west (about 97 feet) that we had planned on Tuesday.
We love making lemonade from lemons when things don’t go exactly as expected in rover tactical planning, and today was no exception. Since we’re sticking around for a little bit longer, the science team decided to collect additional mosaics of impressive nearby features, including a 15×2 Mastcam mosaic of the “Mishe Mokwa” hill and an 11×2 Mastcam mosaic of fractures near “Lake Cachuma.” We’re also having another go at taking the epically long, long-distance RMI mosaic of a crater 91 kilometers away from Curiosity (almost 57 miles) that we planned yesterday, and we’re playing around with the focus settings to see if we can get a sharper image.
The team also had time for a second RMI mosaic of our very well-imaged “Texoli” butte, and a ChemCam LIBS observation on a target named “Santa Monica Bay,” which is just above the “Sisquoc River” target we observed yesterday on the bumpy rock in our workspace. As usual, we will also continue to monitor the environment around us with REMS, RAD, Navcam, and Mastcam observations.
Share
Details
Last Updated May 30, 2025 Related Terms
Blogs Explore More
2 min read Sol 4553: Back to the Boxwork!
Article
13 hours ago
3 min read A Dust Devil Photobombs Perseverance!
Article
14 hours ago
4 min read Sols 4549-4552: Keeping Busy Over the Long Weekend
Article
3 days ago
Keep Exploring Discover More Topics From NASA
Mars
Mars is the fourth planet from the Sun, and the seventh largest. It’s the only planet we know of inhabited…
All Mars Resources
Explore this collection of Mars images, videos, resources, PDFs, and toolkits. Discover valuable content designed to inform, educate, and inspire,…
Rover Basics
Each robotic explorer sent to the Red Planet has its own unique capabilities driven by science. Many attributes of a…
Mars Exploration: Science Goals
The key to understanding the past, present or future potential for life on Mars can be found in NASA’s four…
View the full article
-
By NASA
This NASA/ESA Hubble Space Telescope image features the remote galaxy HerS 020941.1+001557, which appears as a red arc that partially encircles a foreground elliptical galaxy.ESA/Hubble & NASA, H. Nayyeri, L. Marchetti, J. Lowenthal This NASA/ESA Hubble Space Telescope image offers us the chance to see a distant galaxy now some 19.5 billion light-years from Earth (but appearing as it did around 11 billion years ago, when the galaxy was 5.5 billion light-years away and began its trek to us through expanding space). Known as HerS 020941.1+001557, this remote galaxy appears as a red arc partially encircling a foreground elliptical galaxy located some 2.7 billion light-years away. Called SDSS J020941.27+001558.4, the elliptical galaxy appears as a bright dot at the center of the image with a broad haze of stars outward from its core. A third galaxy, called SDSS J020941.23+001600.7, seems to be intersecting part of the curving, red crescent of light created by the distant galaxy.
The alignment of this trio of galaxies creates a type of gravitational lens called an Einstein ring. Gravitational lenses occur when light from a very distant object bends (or is ‘lensed’) around a massive (or ‘lensing’) object located between us and the distant lensed galaxy. When the lensed object and the lensing object align, they create an Einstein ring. Einstein rings can appear as a full or partial circle of light around the foreground lensing object, depending on how precise the alignment is. The effects of this phenomenon are much too subtle to see on a local level but can become clearly observable when dealing with curvatures of light on enormous, astronomical scales.
Gravitational lenses not only bend and distort light from distant objects but magnify it as well. Here we see light from a distant galaxy following the curve of spacetime created by the elliptical galaxy’s mass. As the distant galaxy’s light passes through the gravitational lens, it is magnified and bent into a partial ring around the foreground galaxy, creating a distinctive Einstein ring shape.
The partial Einstein ring in this image is not only beautiful, but noteworthy. A citizen scientist identified this Einstein ring as part of the SPACE WARPS project that asked citizen scientists to search for gravitational lenses in images.
Text Credit: ESA/Hubble
View the full article
-
-
Check out these Videos
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.