Latest news with #unstructuredData
Yahoo
3 days ago
- Business
- Yahoo
Flow Capital Announces a US$5.0 Million Investment in Congruity 360
TORONTO, May 29, 2025 (GLOBE NEWSWIRE) -- Flow Capital Corp. (TSXV:FW) ('Flow Capital' or the 'Company') is pleased to announce the successful closing of a US$5.0 million senior note investment in Congruity 360, a leading provider of unstructured data management and risk mitigation solutions. Congruity 360's Classify360 platform equips organizations with critical capabilities to understand, manage, and secure petabyte-scale unstructured data across cloud, SaaS, and on-premises environments. Its capabilities include data discovery and classification, identification of governance and compliance vulnerabilities, and automated workflows for remediation and infrastructure optimization. Already trusted by Fortune 500 companies operating across the globe, Congruity 360 will use the capital to fuel continued product innovation and growth. 'The unstructured data management and classification market is thriving! We were impressed by Congruity 360's market and product momentum, particularly its automated governance workflow and the introduction of AI into the classification process,' said Alex Baluta, CEO of Flow Capital. 'Given its high growth rate, Flow's covenant-light, founder-friendly capital was a perfect fit for Congruity 360's needs.' '2025's wins have accelerated our product and GTM plans! We are excited to partner with Flow Capital,' said Brian Davidson, CEO of Congruity 360. Technology companies seeking flexible growth capital are invited to apply for funding directly at About Congruity 360 Congruity 360 delivers the only data management solution built on a foundation of classification, by experts in data storage and data privacy. The Classify360 platform is easy to implement, requires no outside consultants, and quickly analyzes and remediates your data at a petabyte scale in days, not weeks or months. About Flow Capital Flow Capital Corp. is a publicly listed provider of flexible growth capital and alternative debt solutions dedicated to supporting high-growth companies. Since its inception in 2018, the company has provided financing to businesses in the US, the UK, and Canada, helping them achieve accelerated growth without the dilutive impact of equity financing or the complexities of traditional bank loans. Flow Capital focuses on revenue-generating, VC-backed, and founder-owned companies seeking $2 to $10 million in capital to drive their continued expansion. Learn more at For further information, please contact: Flow Capital Corp. Alex Baluta Chief Executive Officer alex@ 47 Colborne Street, Suite 303, Toronto, Ontario M5E 1P8 Forward-Looking Information and Statements Certain statements herein may be 'forward-looking' statements that involve known and unknown risks, uncertainties and other factors that may cause the actual results, performance or achievements of Flow or the industry to be materially different from any future results, performance or achievements expressed or implied by such forward-looking statements. Forward-looking statements involve significant risks and uncertainties, should not be read as guarantees of future performance or results, and will not necessarily be accurate indications of whether such results will be achieved. A number of factors could cause actual results to vary significantly from the results discussed in the forward-looking statements. These forward-looking statements reflect current assumptions and expectations regarding future events and operating performance and are made as of the date hereof and Flow assumes no obligation, except as required by law, to update any forward-looking statements to reflect new events or circumstances.
Yahoo
3 days ago
- Business
- Yahoo
Zilliz Introduces Zero-Downtime Migration Services for Seamless Unstructured Data & Vector Embeddings Transfers
New solutions eliminate friction, enabling effortless portability of unstructured data and embeddings across systems — with no downtime, no vendor lock-in, and no added cost. REDWOOD CITY, Calif., May 29, 2025 /CNW/ -- Zilliz, creator of the world's most widely adopted open-source vector database, Milvus, introduced a powerful new set of Migration Services designed to make moving unstructured data and vector embeddings between platforms fast, reliable, and cost-free. These solutions eliminate the technical and operational barriers that typically slow down AI data infrastructure modernization. "Organizations working with unstructured data for AI applications face migration challenges that traditional ETL pipelines simply can't solve," said James Luan, VP of Engineering at Zilliz. "Our new tools provide the missing infrastructure layer — making it easy to migrate from Elasticsearch to Milvus, consolidate across multiple vector stores, or move to Zilliz Cloud with zero disruption." Breaking Down Migration Barriers for Unstructured Data Unstructured data — including images, text, audio, and video — now accounts for over 90% of enterprise data. As organizations turn this data into vector embeddings, they run into major roadblocks: Format Variety: Unstructured data exists in diverse formats (JSON, CSV, Parquet, images, etc.), requiring specialized processing System Fragmentation: Business information is scattered across S3, HDFS, Kafka, data warehouses, and data lakes Vendor Lock-in Risks: Moving vector embeddings between databases often creates technical dependencies and potential vendor lock-in Complex Transformations: Converting unstructured data requires AI model integration for embedding generation and schema mapping Two Flexible Options for Every Environment Zilliz offers Migration Services that directly respond to these challenges through two complementary deployment options: Zilliz Migration Service provides a free, fully managed solution with zero configuration requirements and zero downtime. This service handles all aspects of migration while maintaining continuous synchronization between source and target systems. Vector Transport Service (VTS), available as open-source software, offers the same capabilities for organizations that require self-hosted deployments in secure or air-gapped environments. Purpose-Built for AI and Vector Workloads Both solutions deliver essential features specifically designed for unstructured data and vector embeddings: Zero-Downtime Migrations: Continuous synchronization keeps applications running seamlessly during transitions Broad Source Compatibility: Support for Elasticsearch, Pinecone, Qdrant, PostgreSQL, Milvus, and more Flexible Migration Modes: Options for one-time batch imports or real-time streaming synchronization Purpose-Built for Unstructured Data and Vector Embeddings: Specialized handling with schema mapping and transformations Enterprise-Grade Reliability: Designed for massive datasets with robust monitoring and alerting Empowering Data Freedom Across Industries Organizations across sectors are already using Zilliz Migration Services to transform their AI infrastructure: A global retailer migrated 200 million product embeddings from Elasticsearch to Zilliz Cloud, improving search accuracy by 40% while cutting infrastructure costs in half A healthcare organization moved patient data vectors between systems while maintaining strict HIPAA compliance A financial services provider eliminated vendor dependency by moving to an open-source foundation while maintaining continuous operation "Migrating between platforms without rebuilding pipelines from scratch is a game-changer for our AI strategy," said one customer. "What would have taken months of engineering was completed in days, allowing us to focus on innovation rather than infrastructure management." Availability Zilliz's new migration solutions are now generally available: Zilliz Migration Service: Available as a free, fully managed service within Zilliz Cloud Vector Transport Service: Available as open-source software under the Apache 2.0 license at For more information about Zilliz Migration Services, visit or contact support. About Zilliz Zilliz is an American SaaS company that builds next-generation vector database technologies, helping organizations unlock the value of unstructured data and rapidly develop AI and machine learning applications. By simplifying complex data infrastructure, Zilliz brings the power of AI within reach for enterprises, teams, and individual developers alike. Zilliz offers a fully managed, multi-cloud vector database service powered by open-source Milvus, supporting major cloud platforms such as AWS, GCP, and Azure, and is available across more than 20 countries and regions. Headquartered in Redwood Shores, California, Zilliz is backed by leading investors including Aramco's Prosperity7 Ventures, Temasek's Pavilion Capital, Hillhouse Capital, 5Y Capital, Yunqi Partners, Trustbridge Partners, and others. SOURCE zilliz View original content: Sign in to access your portfolio


Harvard Business Review
4 days ago
- Business
- Harvard Business Review
To Create Value with AI, Improve the Quality of Your Unstructured Data
A company's content lies largely in 'unstructured data'—those emails, contracts, forms, Sharepoint files, recordings of meetings and so forth created via work processes. That proprietary content makes gen AI more distinctive, more knowledgeable about your products and services, less likely to hallucinate, and more likely to bring economic value. As a chief data officer we interviewed pointed out, 'You're unlikely to get much return on your investment by simply installing CoPilot.' Many companies have concluded that the most value from gen AI lies in combining the astounding language, reasoning, and general knowledge of large language models (LLMs) with their own proprietary content. That combination is necessary, for example, in enterprise-level gen AI applications in customer service, marketing, legal, and software development, and product/service offerings for customer use. The most common approach by far to adding a company's own content is 'retrieval augmented generation' or RAG, which combines traditional information-gathering tools like databases with information retrieved by LLMs. It is used because submitting vast quantities of content in a prompt is often technically infeasible or expensive. While technically complex, the RAG approach is quite feasible and yields accurate responses to user prompts if the unstructured data used in RAG is of high quality. Therein lies the problem. Unstructured data is frequently of poor quality—obsolete, duplicative, inaccurate, and poorly-structured, among other problems. Most companies have not done well with the quality of structured data, even as this data is used every day to complete business transactions and understand performance. Unstructured data is tougher. The last serious attempts to address unstructured data date to the 1990s and 2000s when knowledge management was popular. Most efforts proved unsuccessful. Surveys confirm that most leaders are aware that poor quality hampers their generative AI efforts, and that they did not have a strong focus on unstructured data until the advent of gen AI. Of course, the best way to deal with data quality problems is to prevent them. Over the long-term, companies serious about AI must develop programs to do just that. Those who create documents, for example, need to learn to evaluate them for quality and tag key elements. But this will take much concerted effort and is no help in the short term. To get value from gen AI, companies need to build RAG applications using high-quality unstructured data. Our objective in this article is to help them do so by summarizing the most important data problems and the best approaches for dealing with them, both human and technical. What Is Data Quality for Unstructured Data? High-quality data, whether structured or unstructured, only results from focused effort, led by active, engaged leadership, some well-placed professionals, clear management responsibilities for all who touch data, and a relentless commitment to continuous improvement. Absent these things, chances are high your data is not up-to-snuff. As coach and advisor, Alex Borek of the Data Masterclass told us, 'When AI doesn't work, it often reveals flaws in the human system.' Indeed, the best estimate is that 80% of time spent on an AI project will be devoted to data. For example, a Philippines-based Morgan Stanley team spent several years curating research reports in advance of their AI @ Morgan Stanley assistant project. The curation started before gen AI became widespread, which allowed Morgan Stanley to more quickly get their application into production. To work effectively, RAG requires documents directly relevant to the problem at hand, a minimum of duplicated content, and the information contained in those documents to be complete, accurate, and up-to-date. Further, as Seth Earley of Earley Information Science noted, 'You must supply context, as much as possible, if a LLM is to properly interpret these documents.' Unstructured data does not come pre-loaded with the needed context, and gen AI is largely incapable of determining what is the best information to solve a particular business question or issue. It is also not good at 'entity resolution,' i.e., 'Is this 'John Smith' in document A, about customers, the same person as 'J. A. Smith' in that document B, about vendors, and/or the same person as 'Mr. J Smith' in the other document C, about a donation to our foundation?' Most structured data is defined in a data model or dictionary. This provides some context and helps reduce the John Smith/J. A. Smith problem described above. For structured data it is easier to find the data desired, learn who is responsible for it, and understand what the data means. As John Duncan, the head of data governance for the large car retailer CarMax told us, unstructured data also requires the same need for clarity on data ownership, producers, consumers, and stewards. It also benefits from standards for data quality thresholds, data lineage, access controls, and retention durations. This metadata is typically included in a data dictionary. However, with unstructured data, there is seldom a dictionary. Often there is no centralized management of such content; documents are stored haphazardly using different naming conventions and on different computers or cloud providers across the company. There is often no common definition of a content type; an ad agency data leader confessed that there is no common definition of a 'pitch' across the agency. Finally, unstructured documents were often developed with a different purpose than feeding gen AI. A contract with a supplier, for example, was not designed to provide insight about the level of risk in a supplier relationship. We believe it was the late management thinker Charles Handy who observed, 'Information gathered for one purpose is seldom useful for another.' An Unstructured Data Quality Process Fortunately, there are several approaches and tools that can help to improve unstructured data. We recommend that all AI projects follow a disciplined process, building quality in wherever they can. Such a process must embrace the following steps: Address unstructured data quality issues problem by problem, not all at once. Identify and assess the data to be used. Assemble the team to address the problem. Prepare the data, employing both humans (D1) and AI (D2), when possible. Develop your application and validate that it works. Support the application and try to inculcate quality in content creation processes. 1. Address unstructured data quality issues problem by problem, not all at once. There is too much unstructured data to improve all at once. Project leaders should ensure that all involved agree on the problem/opportunity to be addressed. Priorities should be based first on the value to the business of solving the problem, and second on the feasibility and cost of developing a solution—including data quality improvement. Areas of the business with data that is already of reasonably good quality should receive higher priority. That's the approach Nelson Frederick Bamundagea, IT director at the truck refrigeration servicing company W&B Services, has taken. His knowledge retrieval application for service technicians uses the schematics of some 20 (refrigerator) models provided by two manufacturers. These have been used over and over and the vocabulary employed is relatively small, providing for a high level of trust. More generally, Alex Borek advises companies to 'first look to highly curated data products whenever possible.' 2. Identify and assess the data to be used. Since the data is critical to the success of an LLM-based knowledge project, it's important to assess the data at an early stage. There is a human tendency to include any possibly relevant document in a RAG, but companies should adopt a healthy skepticism and a 'less is more' philosophy: absent a good reason to trust a document or content source, don't include it. It's not likely that experts can evaluate every document, but they can dig deeply into a small sample. Are the sample documents loaded with errors, internal inconsistencies, or confusing language—or are they relatively clean? Use your judgment: Keep clean data and proceed with caution; toss bad data. If the data are in horrible shape or you can't find enough good data, reconsider the project. 3. Assemble the team to address the problem. Given the need for some human curation of unstructured data, it's unlikely that a small team of experts can accomplish the necessary work. In addition, those who work with the data from day-to-day typically have a better idea of what constitutes high quality and how to achieve it. In many cases, then, it may be helpful to make data quality improvement a broadly participative project. For example, at Scotiabank, the contact center organization needed to curate documents for a customer chatbot. Center staff took responsibility for the quality of its customer support knowledge base and ensured that each document fed into the RAG-based chatbot was clear, unique, and up to date. 4a. Prepare the data. If you've concluded—and you should—that there must be a human contribution to improving unstructured data quality, this is the time to engage it. That contribution could include having a stakeholder group agree on the key terms—e,g., 'contract,' 'proposal,' 'technical note,' and 'customer' might be examples—and how they are defined. Document this work in a business glossary. This can be hard: Consistent with 'Davenport's Law'—first stated more than 30 years ago —the more an organization knows or cares about a particular information element, the less likely it is to have a common term and meaning for it. This issue can be overcome through 'data arguing' (not data architecture) until the group arrives at a consensus. And, of course, if there is a human curation role, this is the time to begin it. That entails deciding which documents or content sources are the best for a particular issue, 'tagging' it with metadata, and scoring content on such attributes as recency, clarity, and relevance to the topic. Morgan Stanley has a team of 20 or so analysts based in the Philippines that scores each document along 20 different criteria. 4b. Prepare the data with AI. Gen AI itself is quite good at some tasks needed to prepare unstructured data for other gen AI applications. It can, for example, summarize content, classify documents by category of content, and tag key data element. For example, CarMax uses generative AI to translate different car manufacturers' specific language for describing automotive components and capabilities into a standard set of descriptions that is meant to enable a consumer to compare cars across manufacturers. Gen AI can also create good first drafts of 'knowledge graphs,' or displays of what information is related to other information in a network. Knowledge graphs improve the ability of RAG to find the best content quickly. Gen AI is also good at de-duplication, or the process of finding exact or very similar copies of documents and eliminating all but one. Since RAG approaches pick documents based on specified criteria, these criteria (recency, authorship, etc.) can be changed ('re-ranked') to give higher weight to certain ones in content search. We have found, however, that AI is not particularly good at identifying the best document in a set of similar ones, even when given a grading rubric. For that and reviewing tasks humans are still necessary. As a starting point, we recommend using humans to figure what needs to be done, and machines to increase scale and decrease unit cost in execution. 5. Develop your application and validate that it works. The process of developing a RAG models from curated data involves several rather technical steps, best performed by qualified tech staff. Even after having done everything possible to prepare the data, it is essential that organizations rigorously test their RAG applications before putting them into production. This is particularly important for applications that are highly regulated or involve human well-being. One way to validate the model involves identifying '50 Golden Questions,' in which a team identifies questions that the RAG must get right, determines whether it does so, and acts accordingly. The validation should be done over time given that foundational LLMs change often. When a European insurer tried to validate its system for knowledge on how to address claims, it found that customers' contracts, call center personnel, the company's knowledge base, and the claims department often disagreed. This led the company to clarify that the Claims Department 'owned' the answer, i.e., served as the 'gold standard.' Changes to the chatbot, customer contracts, and call center training followed. 6. Support the application and try to inculcate ongoing quality. As a practical matter, no RAG application will enjoy universal acclaim the minute it is deployed. The application can still hallucinate, there will be bugs to be worked out and some level of customer dissatisfaction. We find that some discount a well-performing RAG application if it makes any errors whatsoever. Finally, changes will be needed as the application is used in new ways. So, plan for ongoing quality management and improvement. The plan should include: Some amount of 'qualified-human-in-the-loop' especially in more critical situations A means to trap errors, conduct root cause analysis, and prevent them going forward Efforts to understand who the customers of the RAG are, how they use it, and how they define 'good' Feedback to managers responsible for business processes that create unstructured data to improve future inputs. Content creators can be trained, for example, to create higher-quality documents, tag them as they create, and add them to a central repository. It appears that RAG, featuring proprietary content, combined with LLMs, is going to be with us for the foreseeable future. It is one of the best ways to gain value from gen AI if one can feed models high-quality unstructured data. We know there is a lot here, but it is certainly within reach of those who buckle down and do the work.