Latest news with #BigQuery

Business Wire

3 days ago

Business
Business Wire

Monte Carlo Unveils Unstructured Data Monitoring, a No-Code Solution to Deliver AI-Ready Data at Scale

SAN FRANCISCO--(BUSINESS WIRE)--Monte Carlo, the leading data + AI observability platform, today announced the launch of unstructured data monitoring, a new capability that enables organizations to ensure trust in their unstructured data assets across documents, chat logs, images, and more, all without needing to write a single line of SQL. According to IDC, 90% of a company's data is unstructured, yet for many organizations, its reliability remains a blind spot. With its latest release, Monte Carlo becomes the first data + AI observability platform to close this gap, providing AI-powered support for monitoring both structured and unstructured data types. Observability for the Next Generation of Data + AI Products The advent of generative AI has turned unstructured data into a critical input powering analytics, data products, decision making, and AI applications. Monte Carlo users can now apply customizable, AI-powered checks to unstructured fields, allowing users to monitor for the quality metrics that are relevant to their unique use case. Monte Carlo goes beyond the standard quality metrics and allows customers to use custom prompts and classifications so as to make monitoring truly meaningful. Example use cases include: Flagging texts or images that miss critical details Alerting on drifts in quality of customer service transcripts, as measured by customer sentiment Validating model-generated outputs for tone, structure, or factual grounding Surfacing content that doesn't belong based on topic classification Now, the ability to monitor these and other unstructured data types is fully integrated into Monte Carlo's monitoring engine and can be deployed with just a few clicks. Supported warehouse and lakehouse technologies include Snowflake, Databricks, and BigQuery, with native integration into each platform's respective LLM or AI function libraries, so that sensitive data never leaves customer environments. Teams can create and deploy monitors with minimal setup, ensuring faster time-to-insight and broader coverage. 'Enterprises aren't just building AI—they're racing to build AI they can trust,' said Lior Gavish, co-founder and CTO of Monte Carlo. 'High-quality unstructured data—like customer feedback, support tickets, or internal documentation—isn't just important; it's foundational to building powerful, reliable AI. It can be the difference between a model that performs and one that fails. That's why we designed our monitoring capabilities to proactively detect issues before they impact the business.' Monte Carlo's expansion into monitoring unstructured data is part of our broader vision to provide visibility across the data + AI lifecycle, the company's strategic evolution from a standalone data observability pioneer to the industry's first end-to-end data + AI observability solution. Building Trust in AI Starts With AI-Ready Data Enabling AI-ready data means ensuring compatibility with the powerful data and AI solutions organizations rely on daily. To that end, Monte Carlo is also announcing integrations with both Snowflake and Databricks to support observability for their respective AI-native analytics platforms: Snowflake Cortex Agent and Databricks AI/BI. Supporting Snowflake Cortex Agent and Databricks AI/BI Monte Carlo continues its strategic partnership with Snowflake, the AI Data Cloud company, to support Snowflake Cortex Agents, Snowflake's AI-powered agents that orchestrate across structured and unstructured data to provide more reliable AI-driven decisions. In addition, Monte Carlo is extending its partnership with Databricks to include observability for Databricks AI/BI – a compound AI system built into Databricks' platform that generates rich insights from across the data + AI lifecycle – including ETL pipelines, lineage, and other queries. 'AI applications are only as powerful as the data powering them,' said Shane Murray, Head of AI at Monte Carlo. 'By supporting Snowflake Cortex Agents and Databricks AI/BI, Monte Carlo helps data teams ensure their foundational data is reliable and trustworthy enough to support real-time business insights driven by AI.' Attending Snowflake Summit June 2-5? Visit Monte Carlo's booth at #1508 or check out our other events at the conference here. Attending Databricks Data + AI Summit June 9-12? Visit Monte Carlo's booth #F602 or check out our other events at the conference here. About Monte Carlo Monte Carlo created the data + AI observability category to help enterprises drive mission critical business initiatives with trusted data + AI. NASDAQ, Honeywell, Roche, and hundreds of other data teams rely on Monte Carlo to detect and resolve data + AI issues at scale. Named a 'New Relic for data' by Forbes, Monte Carlo is rated as the #1 data + AI observability solution by G2 Crowd, Gartner Peer Reviews, GigaOm, ISG, and others. To learn more or request a personalized demo, visit:

Uncover Big Data Analysis Secrets with Google BigQuery for Free

Geeky Gadgets

21-05-2025

Business
Geeky Gadgets

Uncover Big Data Analysis Secrets with Google BigQuery for Free

Have you ever hesitated to explore powerful data tools because of the fear of hidden costs or complex setups? If so, you're not alone. Many aspiring data enthusiasts and professionals shy away from platforms like Google BigQuery, assuming they require hefty budgets or advanced expertise. But here's the good news: with the BigQuery Sandbox, you can dive into the world of big data for absolutely no cost. Imagine uploading, managing, and analyzing datasets without spending a dime—all while learning the ropes of one of the most robust data platforms available. This how-to will show you exactly how to make that happen, step by step. In this guide, Mo Chen breaks down the process of uploading data to Google BigQuery using its free Sandbox environment. You'll discover how to set up your first project, create datasets and tables, and troubleshoot common issues along the way. Whether you're a beginner curious about data management or an experienced analyst looking to test BigQuery's capabilities without committing to a paid plan, this walkthrough is designed to empower you. By the end, you'll not only understand BigQuery's structure but also feel confident in preparing your data for deeper analysis. Ready to unlock the potential of big data without breaking the bank? Let's explore how simplicity and power intersect in BigQuery's free tools. BigQuery Sandbox Overview What Is the BigQuery Sandbox? The BigQuery Sandbox is a free environment within Google Cloud that allows you to experiment with BigQuery's capabilities. It is an ideal starting point for learning how to manage data, execute SQL queries, and preview results without worrying about charges. To begin, you need a Google Cloud account and access to the Google Cloud Console. This environment is particularly useful for users who want to explore BigQuery's features before committing to a paid plan. Step 1: Setting Up Your Project Before uploading data, you must create a project. In BigQuery, a project serves as the top-level container for datasets and tables. Follow these steps to set up your project: Log in to the Google Cloud Console using your Google account. Click on 'New Project' and assign a unique project ID. Ensure the ID is descriptive and relevant to your data. Navigate to the BigQuery interface within the console to manage your project's resources and configurations. This project acts as the foundation for organizing your data and resources, making sure a structured approach to data management. How to upload data in Google BigQuery for FREE Watch this video on YouTube. Stay informed about the latest in data analysis by exploring our other resources and articles. Step 2: Understanding BigQuery's Structure BigQuery organizes data in a hierarchical structure, which is crucial for efficient data management and querying. The structure includes the following components: Projects: These are the top-level containers that hold all datasets, tables, and related resources. These are the top-level containers that hold all datasets, tables, and related resources. Datasets: Logical groupings of related tables, similar to folders, that help organize your data. Logical groupings of related tables, similar to folders, that help organize your data. Tables: The actual storage units for your data, organized into rows and columns for easy access and analysis. Understanding this structure ensures that your data is logically organized, making it easier to manage and query efficiently. Step 3: Creating Datasets and Tables Once your project is set up, the next step is to create datasets and tables to store your data. Here's how to proceed: In the BigQuery interface, click on 'Create Dataset' and provide a name, location, and optional description for your dataset. Within the dataset, click on 'Create Table' to define a new table. You can choose to upload a file or create an empty table. Upload your data file, such as a CSV or JSON file, and use the schema auto-detection feature to define the table's structure automatically. If the schema auto-detection feature does not work as expected, you can manually define the schema by specifying column names, data types, and other attributes. This flexibility ensures that your data is accurately structured for analysis. Step 4: Querying and Previewing Your Data BigQuery provides SQL-based tools for querying and analyzing your data. These tools are both powerful and user-friendly, allowing you to extract insights efficiently. To get started: Write a SQL query in the BigQuery editor to retrieve or filter specific data. For example, `SELECT * FROM dataset_name.table_name` retrieves all rows and columns from a table. Use the 'Preview' option to view a sample of the table's contents without running a full query. This feature is particularly useful for verifying data before executing complex queries. These tools allow you to explore and validate your data quickly, making sure it is ready for further analysis. Step 5: Troubleshooting Common Upload Issues While uploading data to BigQuery, you may encounter some common issues. Fortunately, BigQuery provides solutions to address these problems effectively: Header Misalignment: Ensure that the headers in your data file match the table schema. If there is a mismatch, update the headers in your file or redefine the schema during table creation. Ensure that the headers in your data file match the table schema. If there is a mismatch, update the headers in your file or redefine the schema during table creation. Schema Auto-Detection Errors: If the auto-detection feature fails, manually define the schema by specifying column details, such as names and data types, during the upload process. By addressing these issues promptly, you can ensure that your data is correctly structured and ready for analysis. Step 6: Verifying Your Data After uploading your data, it is essential to verify its accuracy and completeness. This step helps identify and resolve any discrepancies before proceeding with advanced analysis. Follow these steps to verify your data: Preview the table contents to confirm that the data matches your expectations. Look for any missing or incorrectly formatted entries. Run basic SQL queries to check for completeness, consistency, and accuracy. For example, use aggregate functions like `COUNT()` to ensure all rows are accounted for. Verifying your data at this stage minimizes errors and ensures a smooth transition to more complex analytical tasks. Exploring BigQuery's Potential The BigQuery Sandbox offers a cost-free way to explore Google BigQuery's robust data management and analysis tools. By understanding its hierarchical structure and following best practices for creating datasets and tables, you can efficiently organize and prepare your data. Once your data is uploaded and verified, you can use BigQuery's advanced features for tasks such as data cleaning, transformation, and in-depth analysis. With its scalable and user-friendly design, BigQuery simplifies the process of managing large datasets, making it an invaluable tool for data professionals and enthusiasts alike. Media Credit: Mo Chen Filed Under: Guides Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

Re-architecting Data Pipelines in Regulated Industries

India.com

16-05-2025

Business
India.com

Re-architecting Data Pipelines in Regulated Industries

From healthcare reimbursement to energy trading, the information flowing through regulatory pipelines has never been more complex—or more consequential. Compliance mandates such as FERC's five-minute settlement rules and the 340B drug-pricing program now demand granular lineage, near-real-time validation, and immutability. At the same time, cloud economics are reshaping how firms ingest, store, and serve terabytes of operational data. Against this backdrop, many organizations still rely on legacy PL/SQL routines or siloed PowerBuilder screens that struggle to keep pace with evolving audit trails. Industries that follow strict rules deal with huge amounts of data, and every bit of it needs to be saved and checked without any room for error. Whether it's fast-paced energy trades or pharmacy claims under programs like 340B, each record matters both legally and financially. Old systems that rely on PL/SQL scripts, scattered rules, or nightly file transfers often struggle to keep up with growing data loads, which can put both compliance and customer trust at risk. Modernization, therefore, demands not only cloud elasticity but forensic lineage, real-time monitoring, and ironclad audit trails. One engineer who has quietly mastered that balancing act is data specialist Naveen Kumar Siripuram. His career traces the path from Oracle partitions to BigQuery pipelines without ever losing auditability. A Career Built on Precision: Naveen Kumar Siripuram Naveen Kumar Siripuram entered this labyrinth in 2015, fresh from a master's program at Texas A&M. 'I was fascinated by the idea that a single mis-keyed FX deal could ripple through an entire settlement system,' he recalls. Over nine years he has examined that ripple from every angle—first at State Street Bank, then at utility giant NextEra Energy, and now at a leading U.S. healthcare provider. His toolkit spans Oracle partitioning, Autosys orchestration, Kafka streams, and most recently Google Cloud's BigQuery and Dataflow. Siripuram's early work on State Street's Wall Street System migration foreshadowed his pragmatic approach. To move high-frequency currency trades onto a Linux-based IORD cluster, he used PL/SQL table functions and bitmap indexes to shave report runtimes by 40 percent. 'My mandate was simple: nothing breaks during close of area,' he says. The discipline of monitoring Autosys event logs at 3 a.m. shaped his bias toward audit-friendly design. The Journalist's Lens: Why Method Matters As a reporter covering data-intensive sectors, I have seen many engineers equate progress with wholesale replacement. Siripuram stands out for weaving incremental change into entrenched processes. At Florida Blue he consolidated twelve rule-set screens into a single UI, but only after mapping each keyword to its actuarial intent. 'You can't refactor a claims engine unless you speak its dialect,' he tells me. His insistence on preparatory dev-analysis documents—unfashionable in some agile circles—reduced defect leakage during monthly BART releases to near zero. That same deliberation guided a two-petabyte Teradata-to-GCP migration he led for his current employer's rebates program. Rather than forklift the data warehouse, he converted BTEQ scripts into parameterized Dataform templates, using materialized views for the costliest joins. 'Partitioning and clustering are free compared to reprocessing stale claims,' Siripuram notes. Internal dashboards show compute spend down by a third, while query latency for pharmacists fell from minutes to seconds. Siripuram also underscores the human dimension. He keeps a Slack channel open with compliance analysts so they can flag anomalous NDC codes in near real time. 'If you wait for the nightly batch, the drug is already dispensed,' he points out. That feedback loop informed his decision to stage raw files in Cloud Storage before canonicalizing them in BigQuery—an architecture that supports ad-hoc SAS extracts without duplicating storage. Closing the Loop on Compliance-ready Data Stepping back, Siripuram's trajectory illustrates a pattern: design for traceability first, performance second, and cloud elasticity third. The order matters because regulated enterprises cannot afford data surprises. His PL/SQL schedulers at NextEra ensured hourly roll-ups met FERC reporting windows; his Airflow DAGs at the healthcare provider guarantee 340B accumulators stay within split-billing tolerances. 'A good pipeline,' he says, 'is one the auditor understands without me in the room.' Looking ahead, Siripuram is experimenting with TensorFlow models that forecast rebate liabilities based on seasonality and formulary shifts. Yet he remains wary of hype. 'Machine learning is useful only if the training data survives an FDA audit,' he cautions a reminder that innovation in these sectors is as much about governance as about code. These days, cloud migrations often grab all the attention, but Naveen Kumar Siripuram's path has been more low-key and more practical. His work shows that real progress starts by carefully looking at the data itself, whether it's coming from a turbine's SCADA system or a pharmacy claim. For teams working in tightly controlled environments, his step-by-step approach is a solid guide: respect what's already in place, move forward carefully, and keep a clear record of every change along the way.

Treasure Data Achieves Google Cloud Ready

National Post

01-05-2025

Business
National Post

Treasure Data Achieves Google Cloud Ready

Article content MOUNTAIN VIEW, Calif. — Treasure Data, the Intelligent Customer Data Platform (CDP) built for enterprise scale and powered by AI, today announced that it has successfully achieved the Google Cloud Ready – BigQuery designation. Article content Article content Google Cloud Ready – BigQuery is a partner integration validation program that intends to help increase customer confidence in partner integrations into BigQuery, Google Cloud's autonomous data-to-AI platform. As part of this initiative, Google Cloud engineering teams validate partner integrations into BigQuery in a three-phase process: Run a series of data integration tests and compare the results against benchmarks, work closely with partners to fill any gaps, and refine documentation for mutual customers. Article content Treasure Data's Intelligent CDP, powered by AI, helps global enterprises increase revenue, reduce costs, and deliver hyper-personalized connected experiences at scale. Treasure Data was named a B2C CDP Leader by independent analyst firms such as Forrester and IDC, and powers customer engagement for over 80 Global 2000 companies, including AB InBev, Nestlé, Stellantis, and Yum! Brands. Article content With this integration, Treasure Data can extract data from Google Cloud's BigQuery, transform it using SQL or Python, and load it into Treasure Data for advanced analytics. This process centralizes BigQuery data within Treasure Data for unified access and analysis. Data from Treasure Data can also be synced back into BigQuery or other tools to activate data across various platforms. Article content Additionally, Treasure Data's Live Connect includes a zero-copy integration with BigQuery, allowing customers to access and utilize BigQuery data without the need for ETL processes, reducing compute costs while retaining security and governance. Customers have the flexibility to choose whether or not to persist a copy of the data within Treasure Data's CDP. Article content By earning this designation, Treasure Data demonstrates that it has proven its product(s) have met a core set of functional and interoperability requirements when integrating with BigQuery. This designation allows Treasure Data customers to have confidence that the Treasure Data products they use today work well with BigQuery or save time on evaluating them, if not already using. Article content The Google Cloud Ready – BigQuery designation provides Treasure Data with more opportunities to collaborate with Google Cloud's partner engineering and BigQuery teams to develop joint roadmaps. Article content 'Our collaboration with Google Cloud continues to unlock new levels of agility and insight for our customers,' said Rafa Flores, Chief Product Officer at Treasure Data. 'We're eliminating data silos and accelerating activation across the board. It's an exciting step toward a more connected, real-time experience, one where marketers and data teams can finally speak the same language. This is part of Treasure Data's ongoing commitment to delivering intelligent CDP capabilities that align closely with how companies already manage and govern data in their cloud ecosystems especially as AI is on the rise, as is the need to build on a reliable data and AI layer you can trust.' Article content Article content Article content Article content Article content Article content

PuppyGraph Achieves Google Cloud Ready Designations for BigQuery and AlloyDB

Yahoo

09-04-2025

Business
Yahoo

PuppyGraph Achieves Google Cloud Ready Designations for BigQuery and AlloyDB

PuppyGraph earns Google Cloud Ready designations for BigQuery and AlloyDB, expanding native graph analytics to two of Google Cloud's most important data platforms. SAN FRANCISCO, April 09, 2025--(BUSINESS WIRE)--PuppyGraph, the first real-time graph query engine built to work directly on modern data lakes and warehouses, announced today that it has achieved the Google Cloud Ready – BigQuery and Google Cloud Ready – AlloyDB designations. These validations signal that PuppyGraph's platform has met Google Cloud's interoperability standards and is optimized to support high-performance graph analytics for enterprise-scale data. The Google Cloud Ready – BigQuery and Google Cloud Ready – AlloyDB programs help customers identify validated partner tools that can integrate smoothly with Google Cloud services. As part of these initiatives, Google Cloud's engineering teams put partner products through a rigorous qualification process, including functional testing, performance benchmarking, and joint documentation development. By earning these designations, PuppyGraph has demonstrated that it can connect seamlessly with BigQuery and AlloyDB, enabling users to run graph queries directly on their data without needing complex ETL pipelines or specialized graph databases. PuppyGraph is the first graph query engine to offer native graph analytics directly on both BigQuery and AlloyDB, enabling teams to explore complex relationships without the overhead of data duplication or ETL. Customers can explore relationships across billions of data points—like tracking fraud networks, visualizing supply chains, or mapping security threats—without having to move or duplicate their data. PuppyGraph uniquely complements relational data systems like BigQuery and AlloyDB by adding graph-native querying capabilities using Gremlin and openCypher. Its real-time query engine supports petabyte-scale datasets and delivers sub-second performance—ideal for use cases in cybersecurity, observability, and financial services, where understanding relationships is critical. "BigQuery and AlloyDB are among the most trusted engines for enterprise-scale analytics, and we're proud to bring native graph capabilities to these platforms," said Weimo Liu, CEO of PuppyGraph. "Our goal is to make graph analytics as accessible and fast as SQL—without adding another database. These designations validate that customers can use PuppyGraph confidently with Google Cloud's most critical data services, and we're excited to keep building alongside the GCP team." Customers using BigQuery and AlloyDB can now enrich their data analytics stack with graph insights—without introducing additional complexity. PuppyGraph connects directly to Google Cloud data sources, allowing teams to ship new features faster and detect issues sooner—all while keeping their data in one place. To learn more about PuppyGraph's integrations with Google Cloud, visit the BigQuery and AlloyDB tutorial blogs. About PuppyGraph: PuppyGraph is the first and only real time, zero-ETL graph query engine in the market, empowering data teams to query existing relational data stores as a unified graph model in under 10 minutes, bypassing traditional graph databases' cost, latency, and maintenance hurdles. Capable of scaling with petabytes of data and executing complex 10-hop queries in seconds, PuppyGraph supports use cases from enhancing LLMs with knowledge graphs to fraud detection, cybersecurity and more. Trusted by industry leaders, including Coinbase, Dawn Capital, Prevalent AI, Clarivate, and more. Learn more at and follow the company on LinkedIn, YouTube and X. View source version on Contacts Media ContactZhenni Wupress@ Sign in to access your portfolio