Members Can Post Anonymously On This Site
World's Largest Digital Sky Survey Issues Biggest Astronomical Data Release Ever
-
Similar Topics
-
By NASA
6 min read
Smarter Searching: NASA AI Makes Science Data Easier to Find
Image snapshot taken from NASA Worldview of NASA’s Global Precipitation Measurement (GPM) mission on March 15, 2025 showing heavy rain across the southeastern U.S. with an overlay of the GCMD Keyword Recommender for Earth Science, Atmosphere, Precipitation, Droplet Size. NASA Worldview Imagine shopping for a new pair of running shoes online. If each seller described them differently—one calling them “sneakers,” another “trainers,” and someone else “footwear for exercise”—you’d quickly feel lost in a sea of mismatched terminology. Fortunately, most online stores use standardized categories and filters, so you can click through a simple path: Women’s > Shoes > Running Shoes—and quickly find what you need.
Now, scale that problem to scientific research. Instead of sneakers, think “aerosol optical depth” or “sea surface temperature.” Instead of a handful of retailers, it is thousands of researchers, instruments, and data providers. Without a common language for describing data, finding relevant Earth science datasets would be like trying to locate a needle in a haystack, blindfolded.
That’s why NASA created the Global Change Master Directory (GCMD), a standardized vocabulary that helps scientists tag their datasets in a consistent and searchable way. But as science evolves, so does the challenge of keeping metadata organized and discoverable.
To meet that challenge, NASA’s Office of Data Science and Informatics (ODSI) at the agency’s Marshall Space Flight Center (MSFC) in Huntsville, Alabama, developed the GCMD Keyword Recommender (GKR): a smart tool designed to help data providers and curators assign the right keywords, automatically.
Smarter Tagging, Accelerated Discovery
The upgraded GKR model isn’t just a technical improvement; it’s a leap forward in how we organize and access scientific knowledge. By automatically recommending precise, standardized keywords, the model reduces the burden on human curators while ensuring metadata quality remains high. This makes it easier for researchers, students, and the public to find exactly the datasets they need.
It also sets the stage for broader applications. The techniques used in GKR, like applying focal loss to rare-label classification problems and adapting pre-trained transformers to specialized domains, can benefit fields well beyond Earth science.
Metadata Matchmaker
The newly upgraded GKR model tackles a massive challenge in information science known as extreme multi-label classification. That’s a mouthful, but the concept is straightforward: Instead of predicting just one label, the model must choose many, sometimes dozens, from a set of thousands. Each dataset may need to be tagged with multiple, nuanced descriptors pulled from a controlled vocabulary.
Think of it like trying to identify all the animals in a photograph. If there’s just a dog, it’s easy. But if there’s a dog, a bird, a raccoon hiding behind a bush, and a unicorn that only shows up in 0.1% of your training photos, the task becomes far more difficult. That’s what GKR is up against: tagging complex datasets with precision, even when examples of some keywords are scarce.
And the problem is only growing. The new version of GKR now considers more than 3,200 keywords, up from about 430 in its earlier iteration. That’s a sevenfold increase in vocabulary complexity, and a major leap in what the model needs to learn and predict.
To handle this scale, the GKR team didn’t just add more data; they built a more capable model from the ground up. At the heart of the upgrade is INDUS, an advanced language model trained on a staggering 66 billion words drawn from scientific literature across disciplines—Earth science, biological sciences, astronomy, and more.
NASA ODSI’s GCMD Keyword Recommender AI model automatically tags scientific datasets with the help of INDUS, a large language model trained on NASA scientific publications across the disciplines of astrophysics, biological and physical sciences, Earth science, heliophysics, and planetary science. NASA “We’re at the frontier of cutting-edge artificial intelligence and machine learning for science,” said Sajil Awale, a member of the NASA ODSI AI team at MSFC. “This problem domain is interesting, and challenging, because it’s an extreme classification problem where the model needs to differentiate even very similar keywords/tags based on small variations of context. It’s exciting to see how we have leveraged INDUS to build this GKR model because it is designed and trained for scientific domains. There are opportunities to improve INDUS for future uses.”
This means that the new GKR isn’t just guessing based on word similarities; it understands the context in which keywords appear. It’s the difference between a model knowing that “precipitation” might relate to weather versus recognizing when it means a climate variable in satellite data.
And while the older model was trained on only 2,000 metadata records, the new version had access to a much richer dataset of more than 43,000 records from NASA’s Common Metadata Repository. That increased exposure helps the model make more accurate predictions.
The Common Metadata Repository is the backend behind the following data search and discovery services:
Earthdata Search International Data Network Learning to Love Rare Words
One of the biggest hurdles in a task like this is class imbalance. Some keywords appear frequently; others might show up just a handful of times. Traditional machine learning approaches, like cross-entropy loss, which was used initially to train the model, tend to favor the easy, common labels, and neglect the rare ones.
To solve this, NASA’s team turned to focal loss, a strategy that reduces the model’s attention to obvious examples and shifts focus toward the harder, underrepresented cases.
The result? A model that performs better across the board, especially on the keywords that matter most to specialists searching for niche datasets.
From Metadata to Mission
Ultimately, science depends not only on collecting data, but on making that data usable and discoverable. The updated GKR tool is a quiet but critical part of that mission. By bringing powerful AI to the task of metadata tagging, it helps ensure that the flood of Earth observation data pouring in from satellites and instruments around the globe doesn’t get lost in translation.
In a world awash with data, tools like GKR help researchers find the signal in the noise and turn information into insight.
Beyond powering GKR, the INDUS large language model is also enabling innovation across other NASA SMD projects. For example, INDUS supports the Science Discovery Engine by helping automate metadata curation and improving the relevancy ranking of search results.The diverse applications reflect INDUS’s growing role as a foundational AI capability for SMD.
The INDUS large language model is funded by the Office of the Chief Science Data Officer within NASA’s Science Mission Directorate at NASA Headquarters in Washington. The Office of the Chief Science Data Officer advances scientific discovery through innovative applications and partnerships in data science, advanced analytics, and artificial intelligence.
Share
Details
Last Updated Jul 09, 2025 Related Terms
Science & Research Artificial Intelligence (AI) Explore More
2 min read Polar Tourists Give Positive Reviews to NASA Citizen Science in Antarctica
Article
6 hours ago
2 min read Hubble Observations Give “Missing” Globular Cluster Time to Shine
Article
6 days ago
5 min read How NASA’s SPHEREx Mission Will Share Its All-Sky Map With the World
Article
7 days ago
Keep Exploring Discover Related Topics
Missions
Humans in Space
Climate Change
Solar System
View the full article
-
By Space Force
More than 700 Guardians around the world are prepared to participate in a U.S. Space Force led large-scale exercise, Resolute Space 2025, which will demonstrate the Space Force’s preparedness for complex, large-scale military operations.
View the full article
-
By NASA
5 min read
How NASA’s SPHEREx Mission Will Share Its All-Sky Map With the World
NASA’s SPHEREx mission will map the entire sky in 102 different wavelengths, or colors, of infrared light. This image of the Vela Molecular Ridge was captured by SPHEREx and is part of the mission’s first ever public data release. The yellow patch on the right side of the image is a cloud of interstellar gas and dust that glows in some infrared colors due to radiation from nearby stars. NASA/JPL-Caltech NASA’s newest astrophysics space telescope launched in March on a mission to create an all-sky map of the universe. Now settled into low-Earth orbit, SPHEREx (Spectro-Photometer for the History of the Universe, Epoch of Reionization, and Ices Explorer) has begun delivering its sky survey data to a public archive on a weekly basis, allowing anyone to use the data to probe the secrets of the cosmos.
“Because we’re looking at everything in the whole sky, almost every area of astronomy can be addressed by SPHEREx data,” said Rachel Akeson, the lead for the SPHEREx Science Data Center at IPAC. IPAC is a science and data center for astrophysics and planetary science at Caltech in Pasadena, California.
Almost every area of astronomy can be addressed by SPHEREx data.
Rachel Akeson
SPHEREx Science Data Center Lead
Other missions, like NASA’s now-retired WISE (Wide-field Infrared Survey Explorer), have also mapped the entire sky. SPHEREx builds on this legacy by observing in 102 infrared wavelengths, compared to WISE’s four wavelength bands.
By putting the many wavelength bands of SPHEREx data together, scientists can identify the signatures of specific molecules with a technique known as spectroscopy. The mission’s science team will use this method to study the distribution of frozen water and organic molecules — the “building blocks of life” — in the Milky Way.
This animation shows how NASA’s SPHEREx observatory will map the entire sky — a process it will complete four times over its two-year mission. The telescope will observe every point in the sky in 102 different infrared wavelengths, more than any other all-sky survey. SPHEREx’s openly available data will enable a wide variety of astronomical studies. Credit: NASA/JPL-Caltech The SPHEREx science team will also use the mission’s data to study the physics that drove the universe’s expansion following the big bang, and to measure the amount of light emitted by all the galaxies in the universe over time. Releasing SPHEREx data in a public archive encourages far more astronomical studies than the team could do on their own.
“By making the data public, we enable the whole astronomy community to use SPHEREx data to work on all these other areas of science,” Akeson said.
NASA is committed to the sharing of scientific data, promoting transparency and efficiency in scientific research. In line with this commitment, data from SPHEREx appears in the public archive within 60 days after the telescope collects each observation. The short delay allows the SPHEREx team to process the raw data to remove or flag artifacts, account for detector effects, and align the images to the correct astronomical coordinates.
The team publishes the procedures they used to process the data alongside the actual data products. “We want enough information in those files that people can do their own research,” Akeson said.
One of the early test images captured by NASA’s SPHEREx mission in April 2025. This image shows a section of sky in one infrared wavelength, or color, that is invisible to the human eye but is represented here in a visible color. This particular wavelength (3.29 microns) reveals a cloud of dust made of a molecule similar to soot or smoke. NASA/JPL-Caltech This image from NASA’s SPHEREx shows the same region of space in a different infrared wavelength (0.98 microns), once again represented by a color that is visible to the human eye. The dust cloud has vanished because the molecules that make up the dust — polycyclic aromatic hydrocarbons — do not radiate light in this color. NASA/JPL-Caltech
During its two-year prime mission, SPHEREx will survey the entire sky twice a year, creating four all-sky maps. After the mission reaches the one-year mark, the team plans to release a map of the whole sky at all 102 wavelengths.
In addition to the science enabled by SPHEREx itself, the telescope unlocks an even greater range of astronomical studies when paired with other missions. Data from SPHEREx can be used to identify interesting targets for further study by NASA’s James Webb Space Telescope, refine exoplanet parameters collected from NASA’s TESS (Transiting Exoplanet Survey Satellite), and study the properties of dark matter and dark energy along with ESA’s (European Space Agency’s) Euclid mission and NASA’s upcoming Nancy Grace Roman Space Telescope.
The SPHEREx mission’s all-sky survey will complement data from other NASA space telescopes. SPHEREx is illustrated second from the right. The other telescope illustrations are, from left to right: the Hubble Space Telescope, the retired Spitzer Space Telescope, the retired WISE/NEOWISE mission, the James Webb Space Telescope, and the upcoming Nancy Grace Roman Space Telescope. NASA/JPL-Caltech The IPAC archive that hosts SPHEREx data, IRSA (NASA/IPAC Infrared Science Archive), also hosts pointed observations and all-sky maps at a variety of wavelengths from previous missions. The large amount of data available through IRSA gives users a comprehensive view of the astronomical objects they want to study.
“SPHEREx is part of the entire legacy of NASA space surveys,” said IRSA Science Lead Vandana Desai. “People are going to use the data in all kinds of ways that we can’t imagine.”
NASA’s Office of the Chief Science Data Officer leads open science efforts for the agency. Public sharing of scientific data, tools, research, and software maximizes the impact of NASA’s science missions. To learn more about NASA’s commitment to transparency and reproducibility of scientific research, visit science.nasa.gov/open-science. To get more stories about the impact of NASA’s science data delivered directly to your inbox, sign up for the NASA Open Science newsletter.
By Lauren Leese
Web Content Strategist for the Office of the Chief Science Data Officer
More About SPHEREx
The SPHEREx mission is managed by NASA’s Jet Propulsion Laboratory for the agency’s Astrophysics Division within the Science Mission Directorate at NASA Headquarters. BAE Systems in Boulder, Colorado, built the telescope and the spacecraft bus. The science analysis of the SPHEREx data will be conducted by a team of scientists located at 10 institutions in the U.S., two in South Korea, and one in Taiwan. Caltech in Pasadena managed and integrated the instrument. The mission’s principal investigator is based at Caltech with a joint JPL appointment. Data will be processed and archived at IPAC at Caltech. The SPHEREx dataset will be publicly available at the NASA-IPAC Infrared Science Archive. Caltech manages JPL for NASA.
To learn more about SPHEREx, visit:
https://nasa.gov/SPHEREx
Media Contacts
Calla Cofield
Jet Propulsion Laboratory, Pasadena, Calif.
626-808-2469
calla.e.cofield@jpl.nasa.gov
Amanda Adams
Office of the Chief Science Data Officer
256-683-6661
amanda.m.adams@nasa.gov
Share
Details
Last Updated Jul 02, 2025 Related Terms
Open Science Astrophysics Galaxies Jet Propulsion Laboratory SPHEREx (Spectro-Photometer for the History of the Universe and Ices Explorer) The Search for Life The Universe Explore More
3 min read Discovery Alert: Flaring Star, Toasted Planet
Article
4 hours ago
11 min read 3 Years of Science: 10 Cosmic Surprises from NASA’s Webb Telescope
Article
5 hours ago
7 min read A New Alloy is Enabling Ultra-Stable Structures Needed for Exoplanet Discovery
Article
1 day ago
Keep Exploring Discover More Topics From NASA
Missions
Humans in Space
Climate Change
Solar System
View the full article
-
By USH
These images captured by the Curiosity rover in 2014 reveals yet another unexplained aerial phenomenon in the Martian atmosphere, a cigar-shaped object with a consistent width and rounded ends.
What makes this anomaly particularly compelling is the sharp clarity of the image. According to Jean Ward the stars in the background appear crisp and unblurred, indicating that the object is not the result of motion blur or a long exposure. Notably, the object appears in five separate frames over an 8-minute span, suggesting it is moving relatively slowly through space, uncharacteristic of a meteorite entering the atmosphere. It also lacks the fiery tail typically associated with atmospheric entry.
Rather than a meteor, the object more closely resembles a solid, elongated craft of unknown origin. When oriented horizontally, it even appears to feature a front-facing structure, possibly a porthole or raised dome, hinting at a cockpit or command module.
Whether this object is orbiting beyond the visible horizon or connected to the surface far in the distance, its sheer size is unmistakable. Its presence raises compelling questions, could this be further evidence of intelligently controlled craft, whether of extraterrestrial or covert human origin, navigating through Martian airspace?View the full article
-
-
Similar Videos
-
Check out these Videos
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.