Untangling the Milky Way's evolution through big-data astronomy

Milky Way map
A map revealing the distribution of metals in our galaxy, the Milky Way. Metal content varies across generations of stars. (Image credit: ESA/Gaia/DPAC)

Untangling the evolution of our home galaxy, the Milky Way, is a challenge similar to mapping the human genome, according to the European Space Agency (ESA). ESA's galaxy mapper, Gaia, takes trillions of measurements of 2 billion of the brightest stars in the sky. Here, we look at what it takes to unpick those measurements to reveal the galaxy's secrets.

On June 13, the Gaia Data Processing and Analysis Consortium (DPAC), a collaboration of 450 European astronomers and engineers supporting the galaxy-mapping endeavor, released what DPAC chair Anthony Brown described as "the richest set of astronomical data ever published."

To create the 10-terabyte catalog of compressed data, DPAC computers had to ingest 940 billion observations of 2 billion of the brightest light sources in the sky, Brown, an astronomer at Leiden University in the Netherlands, said at an ESA news conference on June 13.

Related: New trove of Gaia data will uncloak the Milky Way's dark past and future

The data, captured by Gaia between June 2014 and June 2017, contained information about 1.5 billion stars' precise positions and motions in the sky; details about the ages, temperatures and brightness levels of about half a billion of those stars; and detailed chemical compositions of several million of them. 

It took five years for the data to pass through the sophisticated computational pipeline of validation, calibration and analysis procedures, which involve six supercomputing centers in six European countries. It would take a thousand years for a single (and rather powerful) personal computer to process the data set, Gonzalo Gracia, DPAC project coordinator for data processing, told Space.com.

As of 2022, the main Gaia database contains 1 petabyte of data, Gracia added, which is equivalent to the data capacity of 200,000 DVDs. To date, the telescope has made over 100 measurements of every single one of the 2 billion light sources it sees.

"Every day, Gaia sends us between 20 and 100 gigabytes of data," Gracia said. "That might not seem like that much if you compare it to the bandwidth you have at home, but we are talking about a satellite that is 1.5 million kilometers [930,000 miles] away from Earth."

The Gaia telescope observes 2 billion of the brightest stars in our home galaxy, the Milky Way. (Image credit: ESA/ATG medialab; background: ESO/S. Brunier, CC BY-NC)

The journey of the data 

From Gaia's vantage point at Lagrange Point 2, a stable point in the sun-Earth system where the gravitational pulls of the two bodies are in balance, the spacecraft observes the cosmos while shielded from the sun's glare. 

Three ESA deep-space antennas (one each near Madrid; Malargüe, Argentina; and New Norcia, Australia) receive the data collected by the space probe's two telescopes and other instruments. From those ground stations, the measurements travel on conventional internet lines to the European Space Operations Centre in Darmstadt, Germany, for basic checks, before the data are sent to the agency's Science Operations Center in Madrid. 

"This is when we do the first round of processing," Gracia said. "We do some initial calibrations and run the data through a piece of software to assess the health of the satellite. This happens in the first hours after the data is received."

Then, things start to get complicated. A data-processing center at CNES, the French space agency, in Toulouse scans the data set for fast-moving objects in the solar system: asteroids and comets that might be on a collision course with Earth. 

"They have a pipeline, which detects those objects and checks whether they are already known," Gracia said. "If they are not known, they raise an alarm with the solar system objects community in the world, who can do the follow-up observation and find what the object is about and what is its trajectory."

Gaia is quite efficient in monitoring asteroids and might even be able to see some that are not visible from Earth. The mission's June 13 data release contained information about detailed trajectories of 60,000 solar system space rocks. On top of that, Gaia measured light spectra of these space rocks, revealing their chemical compositions. Previously, astronomers knew detailed chemical compositions of only 4,500 asteroids. 

Separately, a team in Cambridge, England, compares new brightness measurements delivered by Gaia with data acquired earlier. Significant changes in brightness levels of stars are always a reason for excitement, as they might indicate supernovas, explosions that occur when massive stars die before collapsing into black holes or neutron stars

Sometimes, dim distant stars and galaxies can temporarily lighten up through microlensing, an odd phenomenon that happens when an extremely massive object comes between the dim star and the observer, its powerful light-bending gravity acting as a magnifying glass. Gaia, which completes a scan of the entire sky every two months, sees all that.

Over and over

In the meantime, the rest of the consortium conducts what Gracia calls "cyclic processing": endless rounds of redigesting, validating and analyzing the data to extract the most accurate information that astronomers can use to create precise maps of the Milky Way galaxy and model its life into the past and future. Several thousand servers running tens of thousands of core processors are involved in the operation.

"We have to process the data several times," Gracia said. "We process it, we give it to the scientists for checks, and then we have to tune our calibrations, our algorithms; we have to improve them every time."

The data sets are also dependent on each other. For example, without information about precise positions of the observed objects, the data on brightness changes or movements of asteroids would be worthless. 

"We essentially have the information about the amount of photons hitting the Gaia telescopes, and from their position in the window, we derive the positions in the sky," Gracia said. "This is done in Barcelona, where we produce this astrometric information for all the sources in the sky. This is the input for basically all the other processing that we do. It takes a lot of time to do all that and to do it with a sufficient amount of data to ensure that the data is really of the best quality."

This amount of processing is the reason behind the delay between the acquisition of the data and its release. Gaia launched in December 2013, but the astronomical community didn't get their hands on the first batch of data until September 2016. The second data release followed in April 2018. The June 13 data dump was preceded by a partial early release in December 2020. Each new catalog increases the precision of the data as well as the amount of available information about each of the 2 billion light sources the telescope sees. Although the mission is already in its ninth year, there is no stopping for the 450 researchers and engineers at DPAC. 

While the world's Milky Way researchers are unpacking the gifts of the June 13 data release, looking for evidence of the galaxy's dynamic life, Gracia and his colleagues are already busy working on the next data dump, which promises, among other things, to unleash Gaia's potential to spot planets around faraway stars. Thousands of new finds are expected to enrich the existing exoplanet catalog as the DPAC researchers train their algorithms to spot the characteristic mild dimming of a star caused by a planet crossing in front of its disk.

"We started processing data for the fourth cycle two years ago and are already planning the fifth cycle," Gracia said. "It's really nonstop."

Follow Tereza Pultarova on Twitter @TerezaPultarova. Follow us on Twitter @Spacedotcom and on Facebook

Join our Space Forums to keep talking space on the latest missions, night sky and more! And if you have a news tip, correction or comment, let us know at: community@space.com.

Tereza Pultarova
Senior Writer

Tereza is a London-based science and technology journalist, aspiring fiction writer and amateur gymnast. Originally from Prague, the Czech Republic, she spent the first seven years of her career working as a reporter, script-writer and presenter for various TV programmes of the Czech Public Service Television. She later took a career break to pursue further education and added a Master's in Science from the International Space University, France, to her Bachelor's in Journalism and Master's in Cultural Anthropology from Prague's Charles University. She worked as a reporter at the Engineering and Technology magazine, freelanced for a range of publications including Live Science, Space.com, Professional Engineering, Via Satellite and Space News and served as a maternity cover science editor at the European Space Agency.