Astronomers have a "Big Data" problem. While telescopes around the world record reams of data every day, researchers struggle to manage this surplus of information. But there is a change brewing within the astronomy community, one where researchers assume many different roles: astronomer, hacker and communicator.
DotAstronomy, a community that bridges the gap between science research and computer coding, hosted the first "Hack Day" exclusively for astronomy in the United States, last month at the Bit.ly headquarters in New York.
The Dec. 15 event was co-sponsored by Bit.ly and Harvard's Seamless Astronomy Group. Participants had a single day to tackle a problem within astronomy data. The day was split into three parts: presentations of tools that some participants have been working on, hack time, and presentations of the day's accomplishments.
Participants came from all over the tri-state area to learn from other astronomy hackers and work on joint projects. Most of them were either professors or graduate students from NYU, Harvard, Yale or CUNY, but there were others from non-astronomy backgrounds as well.
Many of the tools presented were frameworks to make astronomy data more manageable, often with a heavy community and open-source aspect.
For example, there was Astropy, a community-driven astronomy package; Planethunters.org, where public online users can hunt for exoplanets; the yt-project, a community-driven platform that transforms data into breathtaking graphic models, to help researchers ask better questions from their data; and an API (an interface between a user and a site's database) where you can easily look up any celestial object's spectral data from archives of the Sloan Digital Survey.
Hacking and camaraderie
After the main presentations, everyone grabbed a quick lunch and circled the whiteboard to pitch their hacks. They then split into groups and started exchanging ideas, debugging, and scrawling flow charts or models. Practically all the participants were acquainted with Python computer coding, but still, the best hackers quickly stood out, and many clamored for their aid. [5 Threats That Keep Security Experts Up at Night]
Demitri Muna is one of those hackers. He runs an online forum and workshop called SciCoder, teaching scientists how to efficiently work in Python. Muna is working toward a SciCoder book, which will include a free PDF for the astronomy community.
Muna worked during the hack day with Kelle Cruz from the American Museum of Natural History department of astrophysics and others to create a "SQLite" database to store brown dwarf star data that they could distribute to members' email accounts.
"Astronomers are dealing with an embarrassment of riches in the volume of data at our fingertips, but most still work with the same tools and file formats from 25-30 years ago," Muna said. "These tools are increasingly unable to scale to handle the data we now have. I strongly feel that better, not just more, investment into software development needs to be made in our community."
In our age of social media, it's not just about getting the data, but making it fast and convenient to use. However, many of the online tools that house the necessary astronomy data are scattered in terms of compatibility, programming interface capability, naming conventions, units used, and descriptive data. Astronomers typically have to write the same code over and over again to customize it on a case-by-case basis.
"One problem is that every sub-community [in astronomy] handles data differently, and the tools they might use are different," said Lia Corrales, a grad student at Columbia University. "I'm working to make cross-communication between different databases easier. I work with X-ray data. I'd like to put all of [the] data together, find all the quasars and say something about the dust around each one. I've wasted a lot of time in the past writing the cross-communicating code manually. I learned a lot from that experience but also to never do it again."
Adric Riedel, a researcher at the American Museum of Natural History, said that his current dataset was taken from the SuperCOSMOS Sky Survey, which was photographed in the 1950s; researchers are still getting new things out of it. "We need to work smarter and take advantage of tools that others have built."
David Hogg, an astronomy and physics professor at NYU, worked on a paper predicting the distribution of stars that have transiting exoplanets based on data from NASA's Kepler planet-hunting space telescope.
"Kepler has been very generous with their data and astronomers have just started asking questions about it," he said. "We're guessing [based on observations] that the numbers of one-, two-, and three-planet systems puts a strong constraint on the true numbers. What we really want to do has to be simplified if we are going to finish it in one day." [Gallery: A World of Kepler Planets]
Unfortunately, the paper wasn't completed by the end of the Hack Day, but his group put together a literature review and a graph to model the assumptions about the data.
Hack Day results
Chris Beaumont, a graduate student from Harvard, worked on a project to speed up plots and models in Python. He used OpenGL, a platform used for 3D game graphics, to leverage its processing power and resolution. The results shown, at the end of Hack Day, were quite amazing. He's now planning to create full-featured code for others to use.
And Megan Schwamb, who is part of the Planethunters.org team, created a new API for the site that retrieves data about possible planets orbiting binary stars.
One the most unconventional hacks attempted to take down and expose the flaws of MAST (Mikulski Archive for Space Telescopes), carried out by Micha Gorelick, a data scientist at Bit.ly.
Gorelick found that when MAST data parameters are entered to find celestial objects, the program didn't check what type of data was being requested. This could lead to the ability of hackers to insert their own database commands to manipulate the catalog. Afterward, he contacted the folks at MAST, and they are currently addressing the issue.
Overall, the Hack Day was a success, according to those involved, not only because of the projects completed, but because of the discussions and information sharing that the event sparked.