logo
#

Latest news with #HuggingFaceHub

ByteDance's Dolphin OCR Sets New Benchmark in Document AI
ByteDance's Dolphin OCR Sets New Benchmark in Document AI

Arabian Post

time23-06-2025

  • Business
  • Arabian Post

ByteDance's Dolphin OCR Sets New Benchmark in Document AI

ByteDance has unveiled 'Dolphin', an OCR model released under an MIT licence designed to revolutionise document processing by combining layout analysis and parsing in a unified workflow. This new tool is poised to enhance accuracy and adaptability across complex document types, marking a major advancement in optical character recognition. Dolphin operates by first analysing the document layout—identifying paragraphs, tables, figures and formulas—and then parsing each section in parallel, a method experts describe as 'analyze‑then‑parse'. The model architecture aligns with Donut, a document-oriented vision‑language model, but excels by integrating a two‑step pipeline that improves both structural understanding and text recognition efficiency. Since its LinkedIn announcement, Dolphin and its source code were published on GitHub and the Hugging Face Hub. Industry users, including practitioners from the Transformers community, have actively benchmarked it, noting its strong performance for structured documents containing scientific equations and dense layouts. Initial commentary suggests Dolphin matches or outperforms contemporaries like Donut and DocFormer in speed and layout robustness. ADVERTISEMENT This release underscores ByteDance's expanding role in document‑AI, under its BytePlus technology brand. BytePlus has been promoting OCR and translation capabilities via ModelArk, targeting finance, small business, logistics and automatable workflows. With OCR projected to become a US$43 billion market by 2032, growth driven by demand in banking, healthcare and supply chain sectors, Dolphin arrives at a critical juncture for industry needs. Key to Dolphin's innovation is layout‑first processing. By segmenting a document before interpreting textual content, it reduces errors particularly on documents with heterogeneous formats. As noted by Merve Noyan and others, this approach facilitates precise parsing of tables, mathematical notation, captions and images. Early adopters are testing its effectiveness on complex scientific papers and structured forms, areas where traditional OCR solutions frequently falter. ByteDance enters a crowded landscape of emerging OCR tools. Nanonets' small model supports markdown and LaTeX; MonkeyOCR from Huazhong University follows a structure‑recognition‑relation paradigm; and giants like Google, Microsoft and IBM continue to offer strong enterprise OCR services. Yet Dolphin distinguishes itself through open‑source licensing and its advanced pipeline, potentially accelerating adoption and collaborative development. Despite its promise, Dolphin's real-world strengths remain to be quantified. Benchmarks comparing its accuracy, latency and resource usage against established commercial solutions are limited. Additionally, performance under varying document quality—such as low‑resolution scans, handwriting or languages beyond English—has not been fully validated. Experts expected comparative benchmarks; however ByteDance has not yet released detailed evaluations. ByteDance's broader AI portfolio supports the strategic placement of Dolphin within an integrated multimodal stack. The firm's other recent innovations include Seed 1.5‑VL, a state‑of‑the‑art vision‑language model acclaimed for visual reasoning, GUI interaction and OCR applications, and the Doubao chatbot, enhanced with visual‑language capabilities for real‑time analysis in video calls. Together, these systems showcase ByteDance's ambition to lead in both document‑centric and broad visual‑language AI. ADVERTISEMENT By open‑sourcing Dolphin, ByteDance enables community collaboration and integration into platforms like Hugging Face, where machine learning engineers are already adapting the model into tools such as Transformers, vLLM and Docext. This contrasts with more proprietary offerings, opening pathways for wider testing, research, and adaptation in niche domains such as regulatory compliance, legal document processing or academic publishing. Adoption of Dolphin benefits organisations aiming to automate complex documentation tasks—ranging from invoice reconciliation and regulatory filings to academic publishing and insurance claims. The layout‑aware model structure enhances recognition and data extraction accuracy, while the permissive licence removes traditional barriers to deployment. Its integration into BytePlus also enables developers to tap into scalable API and cloud‑based services, suited for finance, logistics and SME segments. However, absorption of Dolphin into enterprise systems will depend on rigorous validation. Leading market players—like ABBYY, Adobe Acrobat and Microsoft Azure—continue to set high standards in OCR performance, ecosystem support and regulatory compliance. ByteDance must supply detailed performance tests, language support, and enterprise‑grade features to compete effectively. Furthermore, addressing security, data privacy and accuracy in edge‑case layouts remains vital. The emergence of Dolphin reflects an accelerating trend: OCR is evolving beyond simple character reading into intelligent document understanding powered by AI and visual‑language paradigms. As the global OCR market approaches an estimated US$43 billion, technologies like Dolphin are expanding the frontier of what automated document systems can achieve.

Open Source Humanoid Robots: Hugging Face Buys Pollen Robotics
Open Source Humanoid Robots: Hugging Face Buys Pollen Robotics

Forbes

time14-04-2025

  • Business
  • Forbes

Open Source Humanoid Robots: Hugging Face Buys Pollen Robotics

Reachy 2 is an open source humanoid robot from Pollen Robotics, which was just bought by Hugging ... More Face. The open source AI community Hugging Face has bought open source humanoid robots company Pollen Robots, Hugging Face announced this morning. That's great news for pretty much every country that is not the USA or China, the two nations that lead the world in humanoid robotics startups and innovation. 'Super happy to announce that we are acquiring Pollen Robotics to bring open-source robots to the world,' Hugging Face said on X. 'Since Remi Cadene joined us from Tesla, we've become the most widely used software platform for open robotics thanks to LeRobotHF and the Hugging Face Hub. Now, we're taking it a step further by teaming up with Pollen, who is one of the only companies in the world that actually ships open-source humanoid robots!' Pollen Robotics currently offers Reachy 2, a roughly humanoid robot that anyone can buy today for $70,000. It's a rudimentary model right now that does not walk but moves on a wheeled mobile base or can be fixed in position. It does have advanced robotic arms with seven degrees of freedom for complex manipulation of objects, but only a 3 kilogram/6.6 pound lifting capacity per arm. Reachy 2 can be teleoperated with VR equipment, is fully open source, can be programmed in Python, and comes in multiple models with varying capability, but it's best seen as a proof of concept right now. It's certainly nowhere near the level of commercial models like 4NE-1 from Neura Robotics, Apollo from Apptronik, Figure 02 from Figure, Optimus from Tesla, or Digit from Agility Robotics. Nor would Reachy 2 vault Pollen Robotics on to this list of the top 16 humanoid robot manufacturers on the planet. But this deal is still significant. Humanoid robots are advancing in leaps and bounds. Credible industry observers see them as adding serious value to the workforce in a fairly short period of time: years not decades. If that does in fact occur, nations and companies that own and/or adopt these kinds of technologies will be significantly more competitive than those who do not. Over time and with mass manufacturing, the price of labor could approach zero, which fundamentally disrupts current economic and social models. Having open source humanoid robots could be a very valuable resource to those who do not develop the technology themselves. We've seen it before in software with Linux; it's theoretically possible in a hardware world as well. 'We believe robotics could be the next interface for AI — and it should be open, affordable, and hackable,' says Hugging Face. 'Our vision: a future where everyone from the community can build and control their own robot companions instead of relying on closed, expensive black boxes.' That's especially important if the alternative is that you acquire robots from a potentially unfriendly nation that leaves a backdoor in their robots which can turn them into surveillance devices for that nation, as recently appears to have happened with Unitree Robotics's robot dog Go-1. According to MSN, Hugging Face co-founder and chief scientist Thomas Wolf would like to make Reachy 2 (or Reachy 5, or 10) fully open source so that anyone could download schematics for the robot and potentially even 3D print their own. Terms of the acquisition were not disclosed, but it appears that Pollen's entire team of about 20 will be joining Hugging Face, including Pollen's two co-founders, Matthieu Lapeyre and Pierre Rouanet.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store