logo
#

Latest news with #Biderman

EleutherAI releases massive AI training dataset of licensed and open domain text
EleutherAI releases massive AI training dataset of licensed and open domain text

Yahoo

time06-06-2025

  • Business
  • Yahoo

EleutherAI releases massive AI training dataset of licensed and open domain text

EleutherAI, an AI research organization, has released what it claims is one of the largest collections of licensed and open-domain text for training AI models. The dataset, called The Common Pile v0.1, took around two years to complete in collaboration with AI startups Poolside, Hugging Face, and others, along with several academic institutions. Weighing in at 8 terabytes in size, The Common Pile v0.1 was used to train two new AI models from EleutherAI, Comma v0.1-1T and Comma v0.1-2T, that EleutherAI claims perform on par with models developed using unlicensed, copyrighted data. AI companies, including OpenAI, are embroiled in lawsuits over their AI training practices, which rely on scraping the web — including copyrighted material like books and research journals — to build model training datasets. While some AI companies have licensing arrangements in place with certain content providers, most maintain that the U.S. legal doctrine of fair use shields them from liability in cases where they trained on copyrighted work without permission. EleutherAI argues that these lawsuits have "drastically decreased" transparency from AI companies, which the organization says has harmed the broader AI research field by making it more difficult to understand how models work and what their flaws might be. "[Copyright] lawsuits have not meaningfully changed data sourcing practices in [model] training, but they have drastically decreased the transparency companies engage in," Stella Biderman, EleutherAI's executive director, wrote in a blog post on Hugging Face early Friday. "Researchers at some companies we have spoken to have also specifically cited lawsuits as the reason why they've been unable to release the research they're doing in highly data-centric areas." The Common Pile v0.1, which can be downloaded from Hugging Face's AI dev platform and GitHub, was created in consultation with legal experts, and it draws on sources, including 300,000 public domain books digitized by the Library of Congress and the Internet Archive. EleutherAI also used Whisper, OpenAI's open source speech-to-text model, to transcribe audio content. EleutherAI claims Comma v0.1-1T and Comma v0.1-2T are evidence that the Common Pile v0.1 was curated carefully enough to enable developers to build models competitive with proprietary alternatives. According to EleutherAI, the models, both of which are 7 billion parameters in size and were trained on only a fraction of the Common Pile v0.1, rival models like Meta's first Llama AI model on benchmarks for coding, image understanding, and math. Parameters, sometimes referred to as weights, are the internal components of an AI model that guide its behavior and answers. "In general, we think that the common idea that unlicensed text drives performance is unjustified," Biderman wrote in her post. "As the amount of accessible openly licensed and public domain data grows, we can expect the quality of models trained on openly licensed content to improve." The Common Pile v0.1 appears to be in part an effort to right EleutherAI's historical wrongs. Years ago, the company released The Pile, an open collection of training text that includes copyrighted material. AI companies have come under fire — and legal pressure — for using The Pile to train models. EleutherAI is committing to releasing open datasets more frequently going forward in collaboration with its research and infrastructure partners. This article originally appeared on TechCrunch at Sign in to access your portfolio

EleutherAI releases massive AI training dataset of licensed and open domain text
EleutherAI releases massive AI training dataset of licensed and open domain text

Yahoo

time06-06-2025

  • Business
  • Yahoo

EleutherAI releases massive AI training dataset of licensed and open domain text

EleutherAI, an AI research organization, has released what it claims is one of the largest collections of licensed and open-domain text for training AI models. The dataset, called The Common Pile v0.1, took around two years to complete in collaboration with AI startups Poolside, Hugging Face, and others, along with several academic institutions. Weighing in at 8 terabytes in size, The Common Pile v0.1 was used to train two new AI models from EleutherAI, Comma v0.1-1T and Comma v0.1-2T, that EleutherAI claims perform on par with models developed using unlicensed, copyrighted data. AI companies, including OpenAI, are embroiled in lawsuits over their AI training practices, which rely on scraping the web — including copyrighted material like books and research journals — to build model training datasets. While some AI companies have licensing arrangements in place with certain content providers, most maintain that the U.S. legal doctrine of fair use shields them from liability in cases where they trained on copyrighted work without permission. EleutherAI argues that these lawsuits have "drastically decreased" transparency from AI companies, which the organization says has harmed the broader AI research field by making it more difficult to understand how models work and what their flaws might be. "[Copyright] lawsuits have not meaningfully changed data sourcing practices in [model] training, but they have drastically decreased the transparency companies engage in," Stella Biderman, EleutherAI's executive director, wrote in a blog post on Hugging Face early Friday. "Researchers at some companies we have spoken to have also specifically cited lawsuits as the reason why they've been unable to release the research they're doing in highly data-centric areas." The Common Pile v0.1, which can be downloaded from Hugging Face's AI dev platform and GitHub, was created in consultation with legal experts, and it draws on sources, including 300,000 public domain books digitized by the Library of Congress and the Internet Archive. EleutherAI also used Whisper, OpenAI's open source speech-to-text model, to transcribe audio content. EleutherAI claims Comma v0.1-1T and Comma v0.1-2T are evidence that the Common Pile v0.1 was curated carefully enough to enable developers to build models competitive with proprietary alternatives. According to EleutherAI, the models, both of which are 7 billion parameters in size and were trained on only a fraction of the Common Pile v0.1, rival models like Meta's first Llama AI model on benchmarks for coding, image understanding, and math. Parameters, sometimes referred to as weights, are the internal components of an AI model that guide its behavior and answers. "In general, we think that the common idea that unlicensed text drives performance is unjustified," Biderman wrote in her post. "As the amount of accessible openly licensed and public domain data grows, we can expect the quality of models trained on openly licensed content to improve." The Common Pile v0.1 appears to be in part an effort to right EleutherAI's historical wrongs. Years ago, the company released The Pile, an open collection of training text that includes copyrighted material. AI companies have come under fire — and legal pressure — for using The Pile to train models. EleutherAI is committing to releasing open datasets more frequently going forward in collaboration with its research and infrastructure partners. Error in retrieving data Sign in to access your portfolio Error in retrieving data

Sygnia Discovers New Active China-Nexus Threat Actor Weaver Ant
Sygnia Discovers New Active China-Nexus Threat Actor Weaver Ant

Associated Press

time24-03-2025

  • Associated Press

Sygnia Discovers New Active China-Nexus Threat Actor Weaver Ant

TEL-AVIV, Israel & NEW YORK--(BUSINESS WIRE)--Mar 24, 2025-- Sygnia, the foremost global cyber readiness and response team, revealed today a new China nexus threat actor, which the company has named Weaver Ant. To infiltrate the telecom company and gain access to sensitive data, Weaver Ant compromised Zyxel CPE home routers as an entry point into the victim's network. The APT also utilized a new web shell, dubbed 'INMemory' to enable in-memory execution of malicious modules while evading detection. This press release features multimedia. View the full release here: Web shell tunneling flow As part of Sygnia's investigation into a separate threat actor, an account that was disabled by initial remediation efforts was re-enabled by a service account. Upon investigation, Sygnia determined that the account had been previously used by Weaver Ant. Notably, the activity originated from a server that had not been previously identified as compromised. This prompted a large-scale forensic investigation and as a result, Sygnia uncovered a variant of the China Chopper Web shell deployed on an internal server that had been compromised for several years. 'Nation-state threat actors like Weaver Ant are incredibly dangerous and persistent with the primary goal of infiltrating critical infrastructure and collecting as much information as they can before being discovered,' said Oren Biderman, Incident Response and Digital Forensic Team Leader at Sygnia. 'Multiple layers of web shells concealed malicious payloads, allowing the threat actor to move laterally within the network and remain evasive until the final payload. These payloads and their ability to leverage never-seen-before web shells to evade detection speaks to Weaver Ant's sophistication and stealthiness.' How Weaver Ant Tunneled into Telco The web shell hunt revealed two types of web shells in different variants. The first was classified by Sygnia as an encrypted China Chopper. China Chopper enabled Weaver Ant to gain remote access and control of web servers. Notably, variants of the China Chopper web shell support AES encryption of a payload, making it highly effective at evading detection at the Web Application Firewall level. The second web shell, INMemory was discovered by Sygnia and had no publicly available references to any other known web shells. INMemory's leveraged just-in-time (JIT) compilation and execution of code at runtime to dynamically execute malicious payloads without having to write them onto the disk. Biderman added, 'Weaver Ant maintained activity within the compromised network for over four years despite repeated attempts to eliminate them from compromised systems. The threat actor adapted their TTPs to the evolving network environment, enabling continuous access to compromised systems and the collection of sensitive information.' Following the investigation and an extensive eradication effort, Sygnia continues to monitor Weaver Ant. The threat actor has already been detected attempting to regain access to the telecom company's network. For the complete details, please see the associated report and technical annex. Sygnia is the world's foremost cyber response and readiness expert. It applies creative approaches and bold solutions to each phase of an organization's security journey, meeting them where they are to ensure cyber resilience. Sygnia is the trusted advisor and service provider of leading organizations worldwide, including Fortune 100 companies. Sygnia is a Temasek company, part of the ISTARI Collective. For more about Sygnia, visit SOURCE: Sygnia Copyright Business Wire 2025. PUB: 03/24/2025 04:00 AM/DISC: 03/24/2025 03:59 AM

Three ways to refresh workplace safety initiatives
Three ways to refresh workplace safety initiatives

Yahoo

time22-03-2025

  • Business
  • Yahoo

Three ways to refresh workplace safety initiatives

This story was originally published on Waste Dive. To receive daily news and insights, subscribe to our free daily Waste Dive newsletter. Finding new ways to keep waste and recycling operations safe is a longtime challenge for even veteran operators, but a little creative thinking can transform safety programs, said speakers at the virtual Waste Advantage Safety Summit on Thursday. Speakers highlighted how small changes to employee engagement, paired with medium and large changes to data analytics and company safety culture, can breathe new life into safety programs for operations of any size. The discussions come as the industry grapples with Bureau of Labor Statistics data showing that waste and recycling collection was ranked as the fourth deadliest job in 2023. 'We can't accept that our safety record, our accidents, collisions and injuries, are just the cost of doing business,' said David Biderman, president of Biderman Consulting. Here are some of the takeaways from the virtual event: Simply scheduling regular safety meetings and training isn't enough to create safer workplaces, speakers said. Employees will tune out content when it feels perfunctory or irrelevant to their daily operations — and that can lead to cut corners, mistakes and injuries, Biderman said. 'Sometimes the guys in the room, they've heard this all before and they can't wait to get back out on the route,' he said. Assess training materials and make updates when the handouts and videos feel stale, speakers said. 'We found that you just have to keep things relevant. If you're going to be using safety videos and slides from 10 years ago, it's time for a refresher,' said Paul Zambrotta, director of safety for Boro-Wide Recycling, a New York-based recycler. And don't be afraid to meet workers where they're at: 'It doesn't hurt to talk about the Mets and the Yankees every so often," he said as an example. Simply bringing in an outside presenter can also mix up safety messages in a new way, speakers said. Boro-Wide is refreshing its safety training as part of its participation in New York City's new commercial waste zone system, which has certain worker safety training requirements. Boro-Wide created a joint venture with another hauler, Mr. T Carting, as part of the process. The resulting safety training has been more lively and interesting as a result, Zambrotta said. 'It's a great experience to get someone else involved, because you learn something new every time you do training, and we can add a fresh perspective to it,' he said. Safety training is also a two-way street, he added. 'Get the employees involved. Let them tell you, let them show you what they've learned. Maybe they can teach you something.' In an ideal workplace, workers do not fear punishment and feel empowered to point out an unsafe condition and will confidently stop work instead of ignoring the issue, speakers said. To create that type of environment, leaders need to actually demonstrate their commitment to that value. That might mean company leaders diligently attend toolbox talks — even the ones that start at 4:30 in the morning — 'to send the signal that frontline safety matters,' Biderman said. Examples could also include offering workers small incentives for pointing out safety hazards or following safety rules, such as providing gift cards or other rewards, added David Bennett, public works solid waste director for the City of Scottsdale, Arizona. 'You always hear about what people did wrong, but we also want buy-in by showing what people do right.' Leadership must be willing to look for proactive ways to prevent safety issues for frontline workers, even when it could eat into operations time and budgets in the short term, said Shawn Mandel, vice president of safety and risk management for Waste Connections. Safe workplaces are better for employees and also save money in the long run, both in terms of insurance costs and the cost to bring in and train new workers. 'By living out those operating values, the success of the organization, including the profitability, will flow,' he said. In Scottsdale, the city was grappling with a 'less than stellar' safety record about six years ago, Bennett added. That included incidents such as a critical worker injury, a crash involving a collection vehicle and a major yard fire. A few years later, the city was able to bring down both insurance costs and safety incidents in part by being willing to reach out to 'key partners' for help — including state OSHA inspectors, he said. Scottsdale's public works department worked with OSHA to conduct a voluntary compliance inspection, then implemented some operational changes based on the inspector's feedback. That included adding new safety screens and adjusting workflow to prevent muscle strains and other issues, Bennett said. The department also added dash cams on certain trucks. If all else fails, leaders must also be willing to let go of employees who repeatedly violate safety rules, Biderman said. That can be tough to do, especially when the employee is a productive worker and their role is challenging to fill, but 'we have to drive home that it's 'safely or not at all.'' he said. Waste operators increasingly have access to more and better data from sources like route optimization software, pre- and post-trip inspection reports, dash cams and fire prevention equipment. Speakers stressed the importance of proactively assessing that data for safety trends that can help prevent injuries to both frontline workers and the public. 'You're never going to have a better data set than your own, because you have the ability to analyze it, slice it, dice it and make immediate changes that impact what you're doing inside of your own company to try and be safer,' said Nathan Brainard, division president at Insurance Office of America. It can be overwhelming to know where to start, but data can help 'figure out what your most frequent and severe incidents are and figure out what contributed to them,' Biderman said. Create accountability by weaving safety data into company key performance indicators and using the numbers to guide how to prioritize safety improvements for the year, speakers advised. Safety dashboards are one tool to help. Remember to pair that data with feedback from frontline workers, who are the eyes and ears of the operations, Zambrotta said. 'The drivers and helpers might give you some great information on how you can be more efficient and safer. It's really just a balance between management and frontline workers, because we're all on the same team. We all have the same goal in mind.' Recommended Reading Waste and recycling collection was fourth deadliest occupation in 2023: BLS

Fox addresses Kings trade rumors: ‘I expect the unexpected'
Fox addresses Kings trade rumors: ‘I expect the unexpected'

Yahoo

time29-01-2025

  • Sport
  • Yahoo

Fox addresses Kings trade rumors: ‘I expect the unexpected'

Fox addresses Kings trade rumors: 'I expect the unexpected' originally appeared on NBC Sports Bay Area De'Aaron Fox shared his first public comments since it was reported Tuesday that the Kings were open to discussing potentially trading him before the NBA's Feb. 6 deadline. The Kings star point guard spoke to The Sacramento Bee's Chris Biderman after shootaround Wednesday morning in Philadelphia about the latest developments. 'In this league, I expect the unexpected,' Fox told Biderman. 'I think crazier things have happened.' Valid. Fox has had a front-row seat to a lot of the 'craziness' that has unfolded in Sacramento over his eight-year tenure with the team. On Tuesday, ESPN's Shams Charania reported, citing sources, that Sacramento is 'expected to open up talks to potentially deal' Fox before next Thursday's trade deadline, adding the following morning that the San Antonio Spurs are his preferred destination. Fox later confirmed that to Biderman. 'For sure, I think everybody has a preferred destination,' Fox told Biderman. 'I think everybody has a preferred destination if they're not in the place that — or if they're not going to be in the place where they are in the moment. I think it's natural.' Fox's future with the only NBA franchise he's known grew uncertain after he opted not to sign an extension last offseason. He emphasized his desire to compete at the highest level, something he hasn't consistently experienced during his time in Sacramento. It was reported in late December that Fox's agent, Rich Paul, met with Kings general manager Monte McNair and assistant GM Wes Wilcox to discuss the All-Star's future, with an indication that San Antonio already was positioning itself to acquire Fox if he became available on the trade market. Despite all this, Fox's wife, Recee, maintained on social media that the 27-year-old 'has never asked for a trade … especially while being in the midst of a good run.' While speaking to Biderman, Fox himself confirmed that and made it clear that he will be on the court for games and won't sit out as long as he remains a member of the Kings. Still has never asked for a trade btw…especially while being in the midst of a good run. That should be clear — Recee Fox (@Cee_Caldwell) January 29, 2025 'That's their decision to make. I can't tell them not to listen to offers or I can't tell them to listen to offers,' Fox told Biderman. 'Every day I step on the court, I do my job. That's always my thing. I've never been a person to worry about anything else or go and do anything else. Every time I step on the court, I try to play the best I can, I try to win games.' Fox also didn't rule out the possibility of him staying with the Kings past the trade deadline and beyond if the Kings make the necessary roster improvements. If. Sacramento has failed to address an area of concern over the past two offseasons in acquiring frontcourt help. The Kings have been tied to players such as Brooklyn Nets forward Cam Johnson and Utah Jazz forward John Collins, but as of Wednesday morning, nothing has come to fruition. Fox was a participant in Kings shootaround Wednesday morning as the team's six-game road trip continues against the 76ers. De'Aaron Fox participating at Kings shootaround this morning in Philadelphia. — Chris Biderman (@ChrisBiderman) January 29, 2025 He broke out of a shooting slump in Sacramento's 110-96 win over the Brooklyn Nets, dropping 30 points on 11-of-19 (57.9 percent) shooting from the field and 4 of 7 (57.1 percent) from 3-point land, with seven assists in 37 minutes. Eighteen of his 30 points came in the third quarter that helped guide the victory. For now, it's business as usual for Fox and the Kings as their goal to climb atop the Western Conference standings hasn't changed. 'I think anything's possible in this league. Like I said, crazier things have happened,' Fox told Biderman. Download and follow The Deuce & Mo Podcast

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store