
Huge star-forming cloud discovered in Earth's ‘cosmic backyard'
An invisible molecular cloud that could shed light on how stars and planets form has been detected surprisingly close to Earth.
Named Eos after the Greek goddess of the dawn, the cloud of gas would appear huge in the night sky if visible to the naked eye. It measures roughly 40 moons in width and has a weight about 3,400 times the mass of the sun, researchers reported in a study published Monday in the journal Nature Astronomy.
'In astronomy, seeing the previously unseen usually means peering deeper with ever more sensitive telescopes — detecting those smaller planets … those more distant galaxies,' said study coauthor Thomas Haworth, an astrophysicist at Queen Mary University of London.
'This thing was pretty much in our cosmic backyard, and we've just missed it,' he added.
Molecular clouds are composed of gas and dust from which hydrogen and carbon monoxide molecules can form. Dense clumps within these clouds can collapse to form young stars.
Scientists usually spot a molecular cloud using radio and infrared observations that can pick up the chemical signature for carbon monoxide, Haworth explained.
'We normally look for carbon monoxide, just one carbon atom and one oxygen atom, and that emits light pretty easily at wavelengths that we can detect,' he said. '(Carbon monoxide is) bright, and we have lots of facilities that can spot that.'
However, Eos eluded discovery despite being the closest molecular cloud to Earth because it does not contain much carbon monoxide, and therefore doesn't emit the characteristic signature detected by conventional approaches, the researchers said. The key to unlocking this stunning find was searching for ultraviolent light emitted by hydrogen in the cloud.
'The only reason we managed to catch it in this instance is because we've been able to look with a different color of light,' Haworth added.
Haworth and his colleagues detected Eos in data collected by a far-ultraviolet spectrograph called FIMS-SPEAR that operated as an instrument on a Korean satellite called STSAT-1.
The data had just been released publicly in 2023 when lead study author Blakesley Burkhart, an associate professor in the department of physics and astronomy in the Rutgers School of Arts and Sciences, came across it.
The spectrograph breaks down far-ultraviolet light emitted by a material into its component wavelengths, similar to what a prism does with visible light, creating a spectrum that scientists can analyze.
'This is the first-ever molecular cloud discovered by looking for far ultraviolet emission of molecular hydrogen directly,' Burkhart said in a news release. 'The data showed glowing hydrogen molecules detected via fluorescence in the far ultraviolet. This cloud is literally glowing in the dark.'
The molecular cloud's proximity to Earth provides a unique opportunity to study how solar systems form, Burkhart said.
'Our discovery of Eos is exciting because we can now directly measure how molecular clouds are forming and dissociating, and how a galaxy begins to transform interstellar gas and dust into stars and planets,' Burkhart said.
Astronomers thought they had a good handle on the locations and properties of the molecular clouds within about 1,600 light-years of the sun, making this 'pretty cool discovery' quite a surprise, said Melissa McClure, an assistant professor at the University of Leiden in the Netherlands.
'This new molecular cloud, Eos, is only 300 light-years away, which is closer than any of the molecular clouds that we've known about previously,' McClure, who wasn't involved in the research, said.
'It's puzzling why there's something this big right in our solar neighborhood that we didn't see before,' McClure added. 'It would be a bit like living in a suburb with above-ground houses and open lots in it, and suddenly realizing that one of the open lots actually hosts a hidden underground bunker in it.'
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
an hour ago
- Yahoo
The science behind the smell of rain
You know the smell. It's there every time the first fat raindrops hit the ground—a distinctive, earthy scent that suffuses the air, an aroma that speaks of the changing seasons and promises relief from stifling summer heat. There's a name for the smell of rain, too: 'petrichor,' a poetic portmanteau of the Greek words 'petros' (stone) and 'ichor' (the blood of the gods in Greek mythology). Petrichor: the smell of rain. But what causes it? The name 'petrichor' was coined by Australian scientists Isabel Bear and Dick Thomas in 1964, in a paper that constituted perhaps the first serious scientific attempt to explain the phenomenon. The duo used the word to refer to an oil that they distilled from samples of soil and vegetation that were left for up to a year exposed to air and daylight but shielded from rain. They found that the oil contained a complex mixture of volatile organic compounds. One question left unanswered by Bear and Thomas was the origin of these compounds, and subsequent research has focused on one particular compound, a volatile bicyclic alcohol called geosmin. The compound was isolated a year after Bear and Thomas's paper, and its name literally means 'earth smell.' Along with another volatile organic compound called 2-methylisoborneol or 2-MIB, geosmin is primarily responsible for the characteristic smell of earth—and both contribute greatly to the smell of rain. Ryan Busby, an ecologist at the US Army's Corps of Engineers, tells Popular Science that these compounds exist in soil the world over, and that they're spritzed into the air whenever soil is disturbed. '[The compounds] accumulate in the pore spaces in the soil,' Busby explains. 'There might be some binding to soil particles. [And] research has shown that that impact with the soil surface causes the volatiles to be released into the atmosphere.' So where do geosmin and 2-MIB come from? Busby says that while the source of both compounds remains the subject of plenty of active research, the current scientific consensus is that they are released by soil-dwelling bacteria. Differing ratios of the two compounds may explain why the smell differs subtly from place to place. 'Geosmin is pretty consistent across the environment, while 2-MIB is more variable. [Where 2-MIB is present], it is released in much higher concentrations, so you get areas where there's huge concentrations, and then areas where there's none,' Busby says. The other components that make up petrichor—a myriad less powerful plant-related volatiles, and also perhaps the distinctive acrid smell of ozone that accompanies lightning—vary from location to location. Humans are remarkably sensitive to the smell of geosmin, in particular. In water, it can be detected at concentrations as low as 4 ng/L, which equates to about one teaspoon in 200 Olympic swimming pools. Busby says there are several theories for why this might be. 'One [theory] is finding water sources,' he explains. 'Geosmin seems to be more prevalent in moist, fertile soils.' The presence of moist soil means the presence of water, and it's easy to see how being able to catch a whiff of geosmin on the wind and follow it to a source of water would provide a valuable evolutionary advantage. It's not just humans who appear to be able to rely on the scent of these volatile compounds to find water, Busby says. 'Camels can detect geosmin and find oases in the desert from 50 miles away. Mosquitoes use it to find stagnant ponds for laying eggs, and raccoons use it to find turtle nests and buried eggs.' But while the smell of geosmin and 2-MIB are appealing to us, their taste is the complete opposite. 'It's kind of funny,' muses Busby. 'We love the smell, but we hate the taste.' In water, these compounds are responsible for the musty, moldy taste that indicates that water isn't safe to drink. Busby says, 'Any time you drink water and you think, 'Oh, this, this tastes like lake water,' it's because those compounds are dissolved in what you're drinking.' Again, there's most likely an evolutionary reason for this: it's one thing for the soil around a water source to smell of bacteria, but if the water itself carries the distinctive musty odor of geosmin and 2-MIB, it also most likely carries the potential for gastrointestinal unpleasantness. Busby says that this explains why geosmin and 2-MIB are 'the primary odor contaminants of drinking water globally.' There's one unanswered question here, though: why are geosmin and 2-MIB there in the first place? As Busby points out, while it's clear that 'there are a number of uses for geosmin for us, we're not sure exactly why [bacteria] produce it in such quantities. It's a [large] energy cost to produce a chemical like that.' So why do soil-borne bacteria pump out geosmin and 2-MIB? What's in it for them? A paper published in Nature Microbiology in 2020 suggested a possible answer. The study examined interactions between Streptomyces—one variety of geosmin- and 2-MIB-producing bacteria—and small creatures called springtails. (Springtails are one of three varieties of six-legged arthropods that are not considered insects, and they have a taste for bacteria.) Crucially, the researchers found that in the bacteria studied, geosmin and 2-MIB were produced only by colonies that were also producing reproductive spores. In fact, they can only be produced by those specific colonies: 'The genes for geosmin and 2-MIB synthases are under the direct control of sporulation-specific transcription factors, constraining emission of the odorants to sporulating colonies,' the paper explains. Springtails are attracted by geosmin and 2-MIB, so unsurprisingly, upon arrival at the odor-emitting colonies, they helped themselves happily to a tasty microbial snack. In doing so, they also consumed the bacterial spores. The spores were then able to pass through the springtail's digestive tracts and emerge ready for action from the other end. Busby says this might also explain why the smell of rain is strongest when it comes from rain hitting dry soil. 'As soil dries out, the bacteria are going to go dormant, and there seems to be a flush of release [at that point]. So from that respect, [the compounds] are a way to attract something that maybe will carry [the bacteria] to a more conducive environment for growth.' It might feel like the poetic appeal of petrichor is diminished somewhat by discovering that the oh-so-evocative smell of rain most likely exists to encourage a bunch of tiny arthropods to poop out bacterial spores. But ultimately, it's another example of nature finding a way—a co-evolutionary relationship that recalls bees and pollen, and one that extends its benefits to the rest of us. So the next time the rain hits dry soil, think about the tiny bacteria that both lead us to water and stop us drinking from sources that might harm us. This story is part of Popular Science's Ask Us Anything series, where we answer your most outlandish, mind-burning questions, from the ordinary to the off-the-wall. Have something you've always wanted to know? Ask us.

2 hours ago
AI chatbots need more books to learn from. These libraries are opening their stacks
CAMBRIDGE, Mass. -- Everything ever said on the internet was just the start of teaching artificial intelligence about humanity. Tech companies are now tapping into an older repository of knowledge: the library stacks. Nearly one million books published as early as the 15th century — and in 254 languages — are part of a Harvard University collection being released to AI researchers Thursday. Also coming soon are troves of old newspapers and government documents held by Boston's public library. Cracking open the vaults to centuries-old tomes could be a data bonanza for tech companies battling lawsuits from living novelists, visual artists and others whose creative works have been scooped up without their consent to train AI chatbots. 'It is a prudent decision to start with public domain data because that's less controversial right now than content that's still under copyright,' said Burton Davis, a deputy general counsel at Microsoft. Davis said libraries also hold 'significant amounts of interesting cultural, historical and language data' that's missing from the past few decades of online commentary that AI chatbots have mostly learned from. Supported by 'unrestricted gifts' from Microsoft and ChatGPT maker OpenAI, the Harvard-based Institutional Data Initiative is working with libraries around the world on how to make their historic collections AI-ready in a way that also benefits libraries and the communities they serve. 'We're trying to move some of the power from this current AI moment back to these institutions,' said Aristana Scourtas, who manages research at Harvard Law School's Library Innovation Lab. 'Librarians have always been the stewards of data and the stewards of information.' Harvard's newly released dataset, Institutional Books 1.0, contains more than 394 million scanned pages of paper. One of the earlier works is from the 1400s — a Korean painter's handwritten thoughts about cultivating flowers and trees. The largest concentration of works is from the 19th century, on subjects such as literature, philosophy, law and agriculture, all of it meticulously preserved and organized by generations of librarians. It promises to be a boon for AI developers trying to improve the accuracy and reliability of their systems. 'A lot of the data that's been used in AI training has not come from original sources,' said the data initiative's executive director, Greg Leppert, who is also chief technologist at Harvard's Berkman Klein Center for Internet & Society. This book collection goes "all the way back to the physical copy that was scanned by the institutions that actually collected those items,' he said. Before ChatGPT sparked a commercial AI frenzy, most AI researchers didn't think much about the provenance of the passages of text they pulled from Wikipedia, from social media forums like Reddit and sometimes from deep repositories of pirated books. They just needed lots of what computer scientists call tokens — units of data, each of which can represent a piece of a word. Harvard's new AI training collection has an estimated 242 billion tokens, an amount that's hard for humans to fathom but it's still just a drop of what's being fed into the most advanced AI systems. Facebook parent company Meta, for instance, has said the latest version of its AI large language model was trained on more than 30 trillion tokens pulled from text, images and videos. Meta is also battling a lawsuit from comedian Sarah Silverman and other published authors who accuse the company of stealing their books from 'shadow libraries' of pirated works. Now, with some reservations, the real libraries are standing up. OpenAI, which is also fighting a string of copyright lawsuits, donated $50 million this year to a group of research institutions including Oxford University's 400-year-old Bodleian Library, which is digitizing rare texts and using AI to help transcribe them. When the company first reached out to the Boston Public Library, one of the biggest in the U.S., the library made clear that any information it digitized would be for everyone, said Jessica Chapel, its chief of digital and online services. 'OpenAI had this interest in massive amounts of training data. We have an interest in massive amounts of digital objects. So this is kind of just a case that things are aligning,' Chapel said. Digitization is expensive. It's been painstaking work, for instance, for Boston's library to scan and curate dozens of New England's French-language newspapers that were widely read in the late 19th and early 20th century by Canadian immigrant communities from Quebec. Now that such text is of use as training data, it helps bankroll projects that librarians want to do anyway. 'We've been very clear that, 'Hey, we're a public library,'" Chapel said. 'Our collections are held for public use, and anything we digitized as part of this project will be made public.' Harvard's collection was already digitized starting in 2006 for another tech giant, Google, in its controversial project to create a searchable online library of more than 20 million books. Google spent years beating back legal challenges from authors to its online book library, which included many newer and copyrighted works. It was finally settled in 2016 when the U.S. Supreme Court let stand lower court rulings that rejected copyright infringement claims. Now, for the first time, Google has worked with Harvard to retrieve public domain volumes from Google Books and clear the way for their release to AI developers. Copyright protections in the U.S. typically last for 95 years, and longer for sound recordings. How useful all of this will be for the next generation of AI tools remains to be seen as the data gets shared Thursday on the Hugging Face platform, which hosts datasets and open-source AI models that anyone can download. The book collection is more linguistically diverse than typical AI data sources. Fewer than half the volumes are in English, though European languages still dominate, particularly German, French, Italian, Spanish and Latin. A book collection steeped in 19th century thought could also be 'immensely critical' for the tech industry's efforts to build AI agents that can plan and reason as well as humans, Leppert said. 'At a university, you have a lot of pedagogy around what it means to reason,' Leppert said. 'You have a lot of scientific information about how to run processes and how to run analyses.' At the same time, there's also plenty of outdated data, from debunked scientific and medical theories to racist narratives. 'When you're dealing with such a large data set, there are some tricky issues around harmful content and language," said Kristi Mukk, a coordinator at Harvard's Library Innovation Lab who said the initiative is trying to provide guidance about mitigating the risks of using the data, to 'help them make their own informed decisions and use AI responsibly.'


San Francisco Chronicle
2 hours ago
- San Francisco Chronicle
AI chatbots need more books to learn from. These libraries are opening their stacks
CAMBRIDGE, Mass. (AP) — Everything ever said on the internet was just the start of teaching artificial intelligence about humanity. Tech companies are now tapping into an older repository of knowledge: the library stacks. Nearly one million books published as early as the 15th century — and in 254 languages — are part of a Harvard University collection being released to AI researchers Thursday. Also coming soon are troves of old newspapers and government documents held by Boston's public library. Cracking open the vaults to centuries-old tomes could be a data bonanza for tech companies battling lawsuits from living novelists, visual artistsand others whose creative works have been scooped up without their consent to train AI chatbots. 'It is a prudent decision to start with public domain data because that's less controversial right now than content that's still under copyright,' said Burton Davis, a deputy general counsel at Microsoft. Davis said libraries also hold 'significant amounts of interesting cultural, historical and language data' that's missing from the past few decades of online commentary that AI chatbots have mostly learned from. Supported by 'unrestricted gifts' from Microsoft and ChatGPT maker OpenAI, the Harvard-based Institutional Data Initiative is working with libraries around the world on how to make their historic collections AI-ready in a way that also benefits libraries and the communities they serve. 'We're trying to move some of the power from this current AI moment back to these institutions,' said Aristana Scourtas, who manages research at Harvard Law School's Library Innovation Lab. 'Librarians have always been the stewards of data and the stewards of information.' Harvard's newly released dataset, Institutional Books 1.0, contains more than 394 million scanned pages of paper. One of the earlier works is from the 1400s — a Korean painter's handwritten thoughts about cultivating flowers and trees. The largest concentration of works is from the 19th century, on subjects such as literature, philosophy, law and agriculture, all of it meticulously preserved and organized by generations of librarians. It promises to be a boon for AI developers trying to improve the accuracy and reliability of their systems. 'A lot of the data that's been used in AI training has not come from original sources,' said the data initiative's executive director, Greg Leppert, who is also chief technologist at Harvard's Berkman Klein Center for Internet & Society. This book collection goes "all the way back to the physical copy that was scanned by the institutions that actually collected those items,' he said. Before ChatGPT sparked a commercial AI frenzy, most AI researchers didn't think much about the provenance of the passages of text they pulled from Wikipedia, from social media forums like Reddit and sometimes from deep repositories of pirated books. They just needed lots of what computer scientists call tokens — units of data, each of which can represent a piece of a word. Harvard's new AI training collection has an estimated 242 billion tokens, an amount that's hard for humans to fathom but it's still just a drop of what's being fed into the most advanced AI systems. Facebook parent company Meta, for instance, has said the latest version of its AI large language model was trained on more than 30 trillion tokens pulled from text, images and videos. Meta is also battling a lawsuit from comedian Sarah Silverman and other published authors who accuse the company of stealing their books from 'shadow libraries' of pirated works. Now, with some reservations, the real libraries are standing up. OpenAI, which is also fighting a string of copyright lawsuits, donated $50 million this year to a group of research institutions including Oxford University's 400-year-old Bodleian Library, which is digitizing rare texts and using AI to help transcribe them. When the company first reached out to the Boston Public Library, one of the biggest in the U.S., the library made clear that any information it digitized would be for everyone, said Jessica Chapel, its chief of digital and online services. 'OpenAI had this interest in massive amounts of training data. We have an interest in massive amounts of digital objects. So this is kind of just a case that things are aligning,' Chapel said. Digitization is expensive. It's been painstaking work, for instance, for Boston's library to scan and curate dozens of New England's French-language newspapers that were widely read in the late 19th and early 20th century by Canadian immigrant communities from Quebec. Now that such text is of use as training data, it helps bankroll projects that librarians want to do anyway. 'We've been very clear that, 'Hey, we're a public library,'" Chapel said. 'Our collections are held for public use, and anything we digitized as part of this project will be made public.' Harvard's collection was already digitized starting in 2006 for another tech giant, Google, in its controversial project to create a searchable online library of more than 20 million books. Google spent years beating back legal challenges from authors to its online book library, which included many newer and copyrighted works. It was finally settled in 2016 when the U.S. Supreme Court let stand lower court rulings that rejected copyright infringement claims. Now, for the first time, Google has worked with Harvard to retrieve public domain volumes from Google Books and clear the way for their release to AI developers. Copyright protections in the U.S. typically last for 95 years, and longer for sound recordings. How useful all of this will be for the next generation of AI tools remains to be seen as the data gets shared Thursday on the Hugging Face platform, which hosts datasets and open-source AI models that anyone can download. The book collection is more linguistically diverse than typical AI data sources. Fewer than half the volumes are in English, though European languages still dominate, particularly German, French, Italian, Spanish and Latin. A book collection steeped in 19th century thought could also be 'immensely critical' for the tech industry's efforts to build AI agents that can plan and reason as well as humans, Leppert said. 'At a university, you have a lot of pedagogy around what it means to reason,' Leppert said. 'You have a lot of scientific information about how to run processes and how to run analyses.' At the same time, there's also plenty of outdated data, from debunked scientific and medical theories to racist narratives. 'When you're dealing with such a large data set, there are some tricky issues around harmful content and language," said Kristi Mukk, a coordinator at Harvard's Library Innovation Lab who said the initiative is trying to provide guidance about mitigating the risks of using the data, to 'help them make their own informed decisions and use AI responsibly.'