Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Cell Syst ; 14(11): 968-978.e3, 2023 11 15.
Article in English | MEDLINE | ID: mdl-37909046

ABSTRACT

Attention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial-intelligence-driven protein design. However, we lack a sufficient understanding of how very large-scale models and data play a role in effective protein model development. We introduce a suite of protein language models, named ProGen2, that are scaled up to 6.4B parameters and trained on different sequence datasets drawn from over a billion proteins from genomic, metagenomic, and immune repertoire databases. ProGen2 models show state-of-the-art performance in capturing the distribution of observed evolutionary sequences, generating novel viable sequences, and predicting protein fitness without additional fine-tuning. As large model sizes and raw numbers of protein sequences continue to become more widely accessible, our results suggest that a growing emphasis needs to be placed on the data distribution provided to a protein sequence model. Our models and code are open sourced for widespread adoption in protein engineering. A record of this paper's Transparent Peer Review process is included in the supplemental information.


Subject(s)
Artificial Intelligence , Proteins , Proteins/genetics , Amino Acid Sequence , Language , Databases, Factual
3.
Nat Biotechnol ; 41(8): 1099-1106, 2023 08.
Article in English | MEDLINE | ID: mdl-36702895

ABSTRACT

Deep-learning language models have shown promise in various biotechnological applications, including protein design and engineering. Here we describe ProGen, a language model that can generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics. The model was trained on 280 million protein sequences from >19,000 families and is augmented with control tags specifying protein properties. ProGen can be further fine-tuned to curated sequences and tags to improve controllable generation performance of proteins from families with sufficient homologous samples. Artificial proteins fine-tuned to five distinct lysozyme families showed similar catalytic efficiencies as natural lysozymes, with sequence identity to natural proteins as low as 31.4%. ProGen is readily adapted to diverse protein families, as we demonstrate with chorismate mutase and malate dehydrogenase.


Subject(s)
Estrogens, Conjugated (USP) , Proteins , Amino Acid Sequence , Proteins/genetics , Chorismate Mutase/metabolism , Language
4.
NPJ Digit Med ; 5(1): 71, 2022 Jun 08.
Article in English | MEDLINE | ID: mdl-35676445

ABSTRACT

Prostate cancer is the most frequent cancer in men and a leading cause of cancer death. Determining a patient's optimal therapy is a challenge, where oncologists must select a therapy with the highest likelihood of success and the lowest likelihood of toxicity. International standards for prognostication rely on non-specific and semi-quantitative tools, commonly leading to over- and under-treatment. Tissue-based molecular biomarkers have attempted to address this, but most have limited validation in prospective randomized trials and expensive processing costs, posing substantial barriers to widespread adoption. There remains a significant need for accurate and scalable tools to support therapy personalization. Here we demonstrate prostate cancer therapy personalization by predicting long-term, clinically relevant outcomes using a multimodal deep learning architecture and train models using clinical data and digital histopathology from prostate biopsies. We train and validate models using five phase III randomized trials conducted across hundreds of clinical centers. Histopathological data was available for 5654 of 7764 randomized patients (71%) with a median follow-up of 11.4 years. Compared to the most common risk-stratification tool-risk groups developed by the National Cancer Center Network (NCCN)-our models have superior discriminatory performance across all endpoints, ranging from 9.2% to 14.6% relative improvement in a held-out validation set. This artificial intelligence-based tool improves prognostication over standard tools and allows oncologists to computationally predict the likeliest outcomes of specific patients to determine optimal treatment. Outfitted with digital scanners and internet access, any clinic could offer such capabilities, enabling global access to therapy personalization.

5.
NPJ Digit Med ; 4(1): 5, 2021 Jan 08.
Article in English | MEDLINE | ID: mdl-33420381

ABSTRACT

A decade of unprecedented progress in artificial intelligence (AI) has demonstrated the potential for many fields-including medicine-to benefit from the insights that AI techniques can extract from data. Here we survey recent progress in the development of modern computer vision techniques-powered by deep learning-for medical applications, focusing on medical imaging, medical video, and clinical deployment. We start by briefly summarizing a decade of progress in convolutional neural networks, including the vision tasks they enable, in the context of healthcare. Next, we discuss several example medical imaging applications that stand to benefit-including cardiology, pathology, dermatology, ophthalmology-and propose new avenues for continued work. We then expand into general medical video, highlighting ways in which clinical workflows can integrate computer vision to enhance care. Finally, we discuss the challenges and hurdles required for real-world clinical deployment of these technologies.

6.
Nat Commun ; 11(1): 5727, 2020 11 16.
Article in English | MEDLINE | ID: mdl-33199723

ABSTRACT

For newly diagnosed breast cancer, estrogen receptor status (ERS) is a key molecular marker used for prognosis and treatment decisions. During clinical management, ERS is determined by pathologists from immunohistochemistry (IHC) staining of biopsied tissue for the targeted receptor, which highlights the presence of cellular surface antigens. This is an expensive, time-consuming process which introduces discordance in results due to variability in IHC preparation and pathologist subjectivity. In contrast, hematoxylin and eosin (H&E) staining-which highlights cellular morphology-is quick, less expensive, and less variable in preparation. Here we show that machine learning can determine molecular marker status, as assessed by hormone receptors, directly from cellular morphology. We develop a multiple instance learning-based deep neural network that determines ERS from H&E-stained whole slide images (WSI). Our algorithm-trained strictly with WSI-level annotations-is accurate on a varied, multi-country dataset of 3,474 patients, achieving an area under the curve (AUC) of 0.92 for sensitivity and specificity. Our approach has the potential to augment clinicians' capabilities in cancer prognosis and theragnosis by harnessing biological signals imperceptible to the human eye.


Subject(s)
Breast Neoplasms/pathology , Deep Learning , Receptors, Steroid/metabolism , Staining and Labeling , Area Under Curve , Female , Humans , Neoplasm Grading
7.
Proc Natl Acad Sci U S A ; 114(29): 7571-7576, 2017 07 18.
Article in English | MEDLINE | ID: mdl-28684401

ABSTRACT

Which neighborhoods experience physical improvements? In this paper, we introduce a computer vision method to measure changes in the physical appearances of neighborhoods from time-series street-level imagery. We connect changes in the physical appearance of five US cities with economic and demographic data and find three factors that predict neighborhood improvement. First, neighborhoods that are densely populated by college-educated adults are more likely to experience physical improvements-an observation that is compatible with the economic literature linking human capital and local success. Second, neighborhoods with better initial appearances experience, on average, larger positive improvements-an observation that is consistent with "tipping" theories of urban change. Third, neighborhood improvement correlates positively with physical proximity to the central business district and to other physically attractive neighborhoods-an observation that is consistent with the "invasion" theories of urban sociology. Together, our results provide support for three classical theories of urban change and illustrate the value of using computer vision methods and street-level imagery to understand the physical dynamics of cities.

8.
Opt Express ; 22(17): 20164-76, 2014 Aug 25.
Article in English | MEDLINE | ID: mdl-25321226

ABSTRACT

We present a novel approach for evaluation of position and orientation of geometric shapes from scattered time-resolved data. Traditionally, imaging systems treat scattering as unwanted and are designed to mitigate the effects. Instead, we show here that scattering can be exploited by implementing a system based on a femtosecond laser and a streak camera. The result is accurate estimation of object pose, which is a fundamental tool in analysis of complex scenarios and plays an important role in our understanding of physical phenomena. Here, we experimentally show that for a given geometry, a single incident illumination point yields enough information for pose estimation and tracking after multiple scattering events. Our technique can be used for single-shot imaging behind walls or through turbid media.

9.
J Opt Soc Am A Opt Image Sci Vis ; 31(5): 957-63, 2014 May 01.
Article in English | MEDLINE | ID: mdl-24979627

ABSTRACT

Imaging through complex media is a well-known challenge, as scattering distorts a signal and invalidates imaging equations. For coherent imaging, the input field can be reconstructed using phase conjugation or knowledge of the complex transmission matrix. However, for incoherent light, wave interference methods are limited to small viewing angles. On the other hand, time-resolved methods do not rely on signal or object phase correlations, making them suitable for reconstructing wide-angle, larger-scale objects. Previously, a time-resolved technique was demonstrated for uniformly reflecting objects. Here, we generalize the technique to reconstruct the spatially varying reflectance of shapes hidden by angle-dependent diffuse layers. The technique is a noninvasive method of imaging three-dimensional objects without relying on coherence. For a given diffuser, ultrafast measurements are used in a convex optimization program to reconstruct a wide-angle, three-dimensional reflectance function. The method has potential use for biological imaging and material characterization.


Subject(s)
Algorithms , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Nephelometry and Turbidimetry/methods , Photometry/methods , Light , Scattering, Radiation
SELECTION OF CITATIONS
SEARCH DETAIL
...