Latest news with #ArcVirtualCellAtlas


Forbes
25-04-2025
- Health
- Forbes
This Dataset can Ignite An AI Revolution In Cancer Research
Imagine accelerating the discovery of new therapeutics through the development of AI models for mining drug-cell interactions at unprecedented resolution. Tahoe Therapeutics (formerly Vevo) new release may have redefined the race to map the human cellular landscape in cancer. AI and data driven drug-discovery. getty In an unusual move, Tahoe Therapeutics has released 'Tahoe 100M', a massive open-source dataset encompassing 100 million single-cell data points and 60,000 experiments, mapping 1,100 drug treatments across 50 cancer types. Tahoe 100M brings a 50-fold increase in publicly available perturbational single-cell data, positioning itself in the world's largest single cell repository. Tahoe 100M includes what researchers call 'single cell transcriptomics profiles', i.e., a comprehensive list of gene expression data for each individual cell. These 'profiles' provide a snapshot of each cell and how it responds to drug perturbations, portraying a more accurate mosaic of tumor cell interactions. Thus, researchers can use the mosaic to understand the behavior of individual cells and define the impact of cancer heterogeneity on the development of effective treatments. Dr. Johnny Yu, co-founder and technology platform developer at Tahoe, describes the company's unique 'Mosaic Platform', used to generate the dataset, as 'a technology that creates a 'mosaic tumor' that allows testing drugs across multiple cancer types simultaneously and at high throughput'. The 'Mosaic Platform', combined with single-cell resolution, yields 'approximately 20,000 measurements across all protein-coding genes per assay" he continues, 'offering a unique level of cellular granularity'. Using this approach ensures the dataset's immediate practical value, making it a precious resource for AI modeling. Tahoe Therapeutics and the Arc Institute have recently partnered in the launch of the Arc Virtual Cell Atlas: the most comprehensive and diverse public database of single-cell level transcriptomic data across a wide range of perturbations. These data can be obtained for free and used for further analysis and AI modeling. Just in the last month, the dataset has been downloaded almost 11,000 times on Hugging Face, a data sharing platform. Dr. Hani Goodarzi, Tahoe's scientific co-founder, Core Investigator at the Arc Institute and UCSF Professor, puts the dataset into context: 'Tahoe's 'Mosaic Platform' helped minimize 'batch effects', which can make single cell data difficult to compare, offering a more consistent and reliable resource for modeling'. While recent technological advances in using AI, such as the AlphaFold 3 model, have fundamentally unlocked the ability to predict protein structures and drug interactions, understanding patient biology complexity remains a critical challenge. At this intersection, the potential impact of single-cell perturbation datasets on drug discovery can be profound. 'Tahoe 100M enables the building of comprehensive models that can predict drug interactions across diverse patient populations,' states Dr. Nima Alidoust, co-founder and CEO at Tahoe. To develop effective cancer treatments, we need to understand biological interactions beyond simple protein binding. Datasets such as Tahoe 100M account for patient complexity from the earliest stages of drug discovery, thus, having the potential to unlock novel 'AI-first' approaches to drug discovery. Dr. Bo Wang, chief AI scientist for the University Health Network in Canada and among the leading experts in AI for biology and healthcare, believes that the release of this dataset is 'a big deal for the field'. His lab developed the single-cell GPT model (scGPT), one of the first attempts to apply AI large language modeling to single-cell data. This model was trained using 33 million human cells from tissues such as heart, brain, blood, etc. and allows accurate cell type classification in single-cell studies. He believes that 'the Tahoe 100M dataset significantly extends our ability to train AI models to learn more nuanced, dosage-dependent cellular responses in perturbation studies across different cancer types, which help portray more generalizable AI models for drug development'. He is confident that such models will provide more accurate means for early patient stratification and for in silico screening of patient response for precise treatment selection. AI modeling of single cell networks. getty The generous release of Tahoe 100M is a potential turning point for deciphering cancer vulnerabilities at scale and can trigger an open-source data sharing momentum in cancer research. By providing unprecedented access to high-quality, large-scale single-cell data, Tahoe is promoting a more open, collaborative approach to scientific discovery. This is important as recent reports warn about thousands of 3D protein structures and other disease-relevant big datasets held within the vaults of private companies. The release of Tahoe 100M may represent a first step towards creating the 'internet of biology', laying the foundation for the development of truly transformative AI models to integrate and understand cellular biology and drug development at high speed.

Associated Press
25-02-2025
- Science
- Associated Press
Vevo Therapeutics Open Sources Tahoe-100M, the World's Largest Single-Cell Dataset, as the Inaugural Contribution to Arc Institute's New Virtual Cell Atlas
300 million single cell atlas now accessible to the scientific community comprised of Vevo's Tahoe-100M, mapping 60,000 drug-patient interactions, and Arc's AI-curated scBaseCamp 200 million cell dataset Generated using Vevo's Mosaic platform, Tahoe-100M leveraged Parse Biosciences' GigaLab for single cell sample preparation and Ultima Genomics for sequencing. PALO ALTO, Calif. and SOUTH SAN FRANCISCO, Calif., Feb. 25, 2025 /PRNewswire/ -- In a landmark move to advance AI-driven biological research, Arc Institute and Vevo Therapeutics announced today that they have partnered on the first release of the Arc Virtual Cell Atlas—the largest and most biologically diverse public resource for single-cell transcriptomic data across species, tissues, and experimental and perturbation conditions, starting with data from over 300 million unique cells. This data is open source and freely accessible via Arc's website as of February 25, 2025. The atlas currently includes single-cell gene expression data from two massive datasets: Vevo's Tahoe-100M, is the world's largest single-cell dataset, 50x larger than all public drug-perturbed data combined. It includes 100 million cells and maps 60,000 drug-patient interactions, measuring cellular response across 50 cancer cell lines to 1,200 drug perturbations. Tahoe-100M was generated using Vevo's Mosaic Technology, the first platform to make pan-cancer testing of drugs at single cell resolution scalable, and with support from Parse Biosciences' GigaLab leveraging its single-cell RNA sequencing capabilities. Arc's scBaseCamp is the first single-cell RNA sequencing data repository from public data to be curated and reprocessed at scale using AI agents. This gene expression data from another 200 million cells from 21 different species was sourced from public repositories and has been standardized to ensure interoperability for optimal use by machine learning models. 'What makes the Arc Virtual Cell Atlas particularly powerful is not just its scale, but that now researchers can analyze together both observational natural cell states and cells that have been deliberately perturbed by drugs or chemicals to see how they respond,' says Dave Burke ( @davey_burke) Arc Institute's Chief Technology Officer. 'We're grateful to partner with Vevo on our first release of this resource, leveraging their large-scale Tahoe-100M cell dataset, which is crucial for developing predictive models that can simulate cellular responses to perturbations, potentially reducing years of laboratory work to computational queries that take minutes.' 'Something extraordinary happened in the last few years: emergence of AI models that can predict protein structure and function,' says Nima Alidoust ( @nalidoust), Chief Executive Officer and Co-founder of Vevo Therapeutics. 'Our mission at Vevo is to go a huge step further: build AI models of human cells to predict how diseased cells interact with potential drug molecules.' 'These models need massive amounts of observational and drug-perturbed single-cell data, leaps beyond what is publicly available today,' says Johnny Yu, Chief Scientific Officer at Vevo. 'Our Mosaic platform overcomes this fundamental challenge; it can generate single-cell datasets such as Tahoe-100M at a scale that was not possible before.' 'We are open sourcing Tahoe-100M to help start a new movement in biological modeling that goes beyond us,' says Alidoust. 'Releasing it on Arc's Virtual Cell Atlas is the obvious choice as it aims to precisely do that.' The Arc Virtual Cell Atlas is now accessible on this portal: Arc's scBaseCamp Technical Report: About the Arc Institute The Arc Institute ( @arcinstitute) is an independent nonprofit research organization located in Palo Alto, California, that aims to accelerate scientific progress and understand the root causes of complex diseases. Arc's model gives scientists complete freedom to pursue curiosity-driven research agendas and fosters deep interdisciplinary collaboration. About Vevo Therapeutics Vevo Therapeutics is a biotechnology company using its in vivo drug discovery platform and next-generation AI models to uncover better drugs for more patients. The company's Mosaic platform is the first to make multi-patient drug screening data scalable, with single-cell precision, to better represent patient diversity in drug response. Vevo is using Mosaic to build the world's largest atlas of how drugs interact with patient cells and to train disease-relevant models of human cells for discovering novel targets and drugs undetectable by other technologies. Located in South San Francisco, CA, Vevo was founded by a team of inventors and thought leaders who have discovered drugs for 'undruggable' targets and invented novel methods in genomics, computational biology, and chemistry. Learn more at and follow us on LinkedIn and X.