Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 150
Filter
Add more filters

Publication year range
1.
Cell ; 186(8): 1772-1791, 2023 04 13.
Article in English | MEDLINE | ID: mdl-36905928

ABSTRACT

Machine learning (ML) is increasingly used in clinical oncology to diagnose cancers, predict patient outcomes, and inform treatment planning. Here, we review recent applications of ML across the clinical oncology workflow. We review how these techniques are applied to medical imaging and to molecular data obtained from liquid and solid tumor biopsies for cancer diagnosis, prognosis, and treatment design. We discuss key considerations in developing ML for the distinct challenges posed by imaging and molecular data. Finally, we examine ML models approved for cancer-related patient usage by regulatory agencies and discuss approaches to improve the clinical usefulness of ML.


Subject(s)
Machine Learning , Neoplasms , Humans , Neoplasms/diagnosis , Neoplasms/genetics , Neoplasms/therapy , Diagnostic Imaging , Medical Oncology
2.
Physiol Rev ; 103(4): 2423-2450, 2023 10 01.
Article in English | MEDLINE | ID: mdl-37104717

ABSTRACT

Artificial intelligence in health care has experienced remarkable innovation and progress in the last decade. Significant advancements can be attributed to the utilization of artificial intelligence to transform physiology data to advance health care. In this review, we explore how past work has shaped the field and defined future challenges and directions. In particular, we focus on three areas of development. First, we give an overview of artificial intelligence, with special attention to the most relevant artificial intelligence models. We then detail how physiology data have been harnessed by artificial intelligence to advance the main areas of health care: automating existing health care tasks, increasing access to care, and augmenting health care capabilities. Finally, we discuss emerging concerns surrounding the use of individual physiology data and detail an increasingly important consideration for the field, namely the challenges of deploying artificial intelligence models to achieve meaningful clinical impact.


Subject(s)
Artificial Intelligence , Delivery of Health Care , Humans
3.
Nature ; 616(7957): 520-524, 2023 04.
Article in English | MEDLINE | ID: mdl-37020027

ABSTRACT

Artificial intelligence (AI) has been developed for echocardiography1-3, although it has not yet been tested with blinding and randomization. Here we designed a blinded, randomized non-inferiority clinical trial (ClinicalTrials.gov ID: NCT05140642; no outside funding) of AI versus sonographer initial assessment of left ventricular ejection fraction (LVEF) to evaluate the impact of AI in the interpretation workflow. The primary end point was the change in the LVEF between initial AI or sonographer assessment and final cardiologist assessment, evaluated by the proportion of studies with substantial change (more than 5% change). From 3,769 echocardiographic studies screened, 274 studies were excluded owing to poor image quality. The proportion of studies substantially changed was 16.8% in the AI group and 27.2% in the sonographer group (difference of -10.4%, 95% confidence interval: -13.2% to -7.7%, P < 0.001 for non-inferiority, P < 0.001 for superiority). The mean absolute difference between final cardiologist assessment and independent previous cardiologist assessment was 6.29% in the AI group and 7.23% in the sonographer group (difference of -0.96%, 95% confidence interval: -1.34% to -0.54%, P < 0.001 for superiority). The AI-guided workflow saved time for both sonographers and cardiologists, and cardiologists were not able to distinguish between the initial assessments by AI versus the sonographer (blinding index of 0.088). For patients undergoing echocardiographic quantification of cardiac function, initial assessment of LVEF by AI was non-inferior to assessment by sonographers.


Subject(s)
Artificial Intelligence , Cardiologists , Echocardiography , Heart Function Tests , Humans , Artificial Intelligence/standards , Echocardiography/methods , Echocardiography/standards , Stroke Volume , Ventricular Function, Left , Single-Blind Method , Workflow , Reproducibility of Results , Heart Function Tests/methods , Heart Function Tests/standards
4.
Cell ; 152(3): 642-54, 2013 Jan 31.
Article in English | MEDLINE | ID: mdl-23333102

ABSTRACT

Differences in chromatin organization are key to the multiplicity of cell states that arise from a single genetic background, yet the landscapes of in vivo tissues remain largely uncharted. Here, we mapped chromatin genome-wide in a large and diverse collection of human tissues and stem cells. The maps yield unprecedented annotations of functional genomic elements and their regulation across developmental stages, lineages, and cellular environments. They also reveal global features of the epigenome, related to nuclear architecture, that also vary across cellular phenotypes. Specifically, developmental specification is accompanied by progressive chromatin restriction as the default state transitions from dynamic remodeling to generalized compaction. Exposure to serum in vitro triggers a distinct transition that involves de novo establishment of domains with features of constitutive heterochromatin. We describe how these global chromatin state transitions relate to chromosome and nuclear architecture, and discuss their implications for lineage fidelity, cellular senescence, and reprogramming.


Subject(s)
Chromatin Assembly and Disassembly , Chromatin/metabolism , Epigenesis, Genetic , Gene-Environment Interaction , Genome-Wide Association Study , Cell Nucleus , Cellular Senescence , Embryonic Stem Cells/metabolism , Gene Expression Regulation , Humans , Induced Pluripotent Stem Cells/metabolism , Organ Specificity
5.
Article in English | MEDLINE | ID: mdl-39284102

ABSTRACT

In the high-stakes arena of drug discovery, the journey from bench to bedside is hindered by a daunting 92% failure rate, primarily due to unpredicted toxicities and inadequate therapeutic efficacy in clinical trials. The FDA Modernization Act 2.0 heralds a transformative approach, advocating for the integration of alternative methods to conventional animal testing, including cell-based assays that employ human induced pluripotent stem cell (iPSC)-derived organoids, and organ-on-a-chip technologies, in conjunction with sophisticated artificial intelligence (AI) methodologies. Our review explores the innovative capacity of iPSC-derived clinical trial in a dish models designed for cardiovascular disease research. We also highlight how integrating iPSC technology with AI can accelerate the identification of viable therapeutic candidates, streamline drug screening, and pave the way toward more personalized medicine. Through this, we provide a comprehensive overview of the current landscape and future implications of iPSC and AI applications being navigated by the research community and pharmaceutical industry.

6.
Nat Methods ; 21(8): 1422-1429, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39122951

ABSTRACT

Language models are playing an increasingly important role in many areas of artificial intelligence (AI) and computational biology. In this primer, we discuss the ways in which language models, both those based on natural language and those based on biological sequences, can be applied to biological research. This primer is primarily intended for biologists interested in using these cutting-edge AI technologies in their applications. We provide guidance on best practices and key resources for adapting language models for biology.


Subject(s)
Artificial Intelligence , Computational Biology , Computational Biology/methods , Humans , Natural Language Processing , Programming Languages
7.
Nat Methods ; 21(3): 444-454, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38347138

ABSTRACT

Whole-transcriptome spatial profiling of genes at single-cell resolution remains a challenge. To address this limitation, spatial gene expression prediction methods have been developed to infer the spatial expression of unmeasured transcripts, but the quality of these predictions can vary greatly. Here we present Transcript Imputation with Spatial Single-cell Uncertainty Estimation (TISSUE) as a general framework for estimating uncertainty for spatial gene expression predictions and providing uncertainty-aware methods for downstream inference. Leveraging conformal inference, TISSUE provides well-calibrated prediction intervals for predicted expression values across 11 benchmark datasets. Moreover, it consistently reduces the false discovery rate for differential gene expression analysis, improves clustering and visualization of predicted spatial transcriptomics and improves the performance of supervised learning models trained on predicted gene expression profiles. Applying TISSUE to a MERFISH spatial transcriptomics dataset of the adult mouse subventricular zone, we identified subtypes within the neural stem cell lineage and developed subtype-specific regional classifiers.


Subject(s)
Gene Expression Profiling , Neural Stem Cells , Animals , Mice , Uncertainty , Benchmarking , Cluster Analysis , Transcriptome , Single-Cell Analysis
8.
Nature ; 592(7855): 629-633, 2021 04.
Article in English | MEDLINE | ID: mdl-33828294

ABSTRACT

There is a growing focus on making clinical trials more inclusive but the design of trial eligibility criteria remains challenging1-3. Here we systematically evaluate the effect of different eligibility criteria on cancer trial populations and outcomes with real-world data using the computational framework of Trial Pathfinder. We apply Trial Pathfinder to emulate completed trials of advanced non-small-cell lung cancer using data from a nationwide database of electronic health records comprising 61,094 patients with advanced non-small-cell lung cancer. Our analyses reveal that many common criteria, including exclusions based on several laboratory values, had a minimal effect on the trial hazard ratios. When we used a data-driven approach to broaden restrictive criteria, the pool of eligible patients more than doubled on average and the hazard ratio of the overall survival decreased by an average of 0.05. This suggests that many patients who were not eligible under the original trial criteria could potentially benefit from the treatments. We further support our findings through analyses of other types of cancer and patient-safety data from diverse clinical trials. Our data-driven methodology for evaluating eligibility criteria can facilitate the design of more-inclusive trials while maintaining safeguards for patient safety.


Subject(s)
Artificial Intelligence , Clinical Trials as Topic/methods , Datasets as Topic , Medical Oncology , Patient Safety , Patient Selection , Carcinoma, Non-Small-Cell Lung/drug therapy , Clinical Laboratory Techniques , Electronic Health Records/statistics & numerical data , Humans , Lung Neoplasms/drug therapy , Patient Safety/standards , Patient Selection/ethics , Proportional Hazards Models , Reproducibility of Results
9.
Proc Natl Acad Sci U S A ; 121(10): e2313719121, 2024 Mar 05.
Article in English | MEDLINE | ID: mdl-38416677

ABSTRACT

Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional single-cell datasets are alignable (and therefore should even be aligned). Moreover, popular methods can substantially distort the data during alignment, making the aligned data and downstream analysis difficult to interpret. To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data with the same type of features. SMAI provides a statistical test to robustly assess the alignability between datasets to avoid misleading inference and is justified by high-dimensional statistical theory. On a diverse range of real and simulated benchmark datasets, it outperforms commonly used alignment methods. Moreover, we show that SMAI improves various downstream analyses such as identification of differentially expressed genes and imputation of single-cell spatial transcriptomics, providing further biological insights. SMAI's interpretability also enables quantification and a deeper understanding of the sources of technical confounders in single-cell data.


Subject(s)
Algorithms , Gene Expression Profiling , Gene Expression , Single-Cell Analysis
10.
Nature ; 580(7802): 252-256, 2020 04.
Article in English | MEDLINE | ID: mdl-32269341

ABSTRACT

Accurate assessment of cardiac function is crucial for the diagnosis of cardiovascular disease1, screening for cardiotoxicity2 and decisions regarding the clinical management of patients with a critical illness3. However, human assessment of cardiac function focuses on a limited sampling of cardiac cycles and has considerable inter-observer variability despite years of training4,5. Here, to overcome this challenge, we present a video-based deep learning algorithm-EchoNet-Dynamic-that surpasses the performance of human experts in the critical tasks of segmenting the left ventricle, estimating ejection fraction and assessing cardiomyopathy. Trained on echocardiogram videos, our model accurately segments the left ventricle with a Dice similarity coefficient of 0.92, predicts ejection fraction with a mean absolute error of 4.1% and reliably classifies heart failure with reduced ejection fraction (area under the curve of 0.97). In an external dataset from another healthcare system, EchoNet-Dynamic predicts the ejection fraction with a mean absolute error of 6.0% and classifies heart failure with reduced ejection fraction with an area under the curve of 0.96. Prospective evaluation with repeated human measurements confirms that the model has variance that is comparable to or less than that of human experts. By leveraging information across multiple cardiac cycles, our model can rapidly identify subtle changes in ejection fraction, is more reproducible than human evaluation and lays the foundation for precise diagnosis of cardiovascular disease in real time. As a resource to promote further innovation, we also make publicly available a large dataset of 10,030 annotated echocardiogram videos.


Subject(s)
Deep Learning , Heart Diseases/diagnosis , Heart Diseases/physiopathology , Heart/physiology , Heart/physiopathology , Models, Cardiovascular , Video Recording , Atrial Fibrillation , Datasets as Topic , Echocardiography , Heart Failure/physiopathology , Hospitals , Humans , Prospective Studies , Reproducibility of Results , Ventricular Function, Left/physiology
11.
Genome Res ; 32(5): 968-985, 2022 05.
Article in English | MEDLINE | ID: mdl-35332099

ABSTRACT

The recent development and application of methods based on the general principle of "crosslinking and proximity ligation" (crosslink-ligation) are revolutionizing RNA structure studies in living cells. However, extracting structure information from such data presents unique challenges. Here, we introduce a set of computational tools for the systematic analysis of data from a wide variety of crosslink-ligation methods, specifically focusing on read mapping, alignment classification, and clustering. We design a new strategy to map short reads with irregular gaps at high sensitivity and specificity. Analysis of previously published data reveals distinct properties and bias caused by the crosslinking reactions. We perform rigorous and exhaustive classification of alignments and discover eight types of arrangements that provide distinct information on RNA structures and interactions. To deconvolve the dense and intertwined gapped alignments, we develop a network/graph-based tool Crosslinked RNA Secondary Structure Analysis using Network Techniques (CRSSANT), which enables clustering of gapped alignments and discovery of new alternative and dynamic conformations. We discover that multiple crosslinking and ligation events can occur on the same RNA, generating multisegment alignments to report complex high-level RNA structures and multi-RNA interactions. We find that alignments with overlapped segments are produced from potential homodimers and develop a new method for their de novo identification. Analysis of overlapping alignments revealed potential new homodimers in cellular noncoding RNAs and RNA virus genomes in the Picornaviridae family. Together, this suite of computational tools enables rapid and efficient analysis of RNA structure and interaction data in living cells.


Subject(s)
RNA, Untranslated , RNA , Algorithms , Cluster Analysis , RNA/chemistry , RNA/genetics , RNA, Untranslated/chemistry , Sequence Analysis, RNA/methods , Software
12.
Brief Bioinform ; 24(4)2023 07 20.
Article in English | MEDLINE | ID: mdl-37280185

ABSTRACT

The three-dimensional structure of RNA molecules plays a critical role in a wide range of cellular processes encompassing functions from riboswitches to epigenetic regulation. These RNA structures are incredibly dynamic and can indeed be described aptly as an ensemble of structures that shifts in distribution depending on different cellular conditions. Thus, the computational prediction of RNA structure poses a unique challenge, even as computational protein folding has seen great advances. In this review, we focus on a variety of machine learning-based methods that have been developed to predict RNA molecules' secondary structure, as well as more complex tertiary structures. We survey commonly used modeling strategies, and how many are inspired by or incorporate thermodynamic principles. We discuss the shortcomings that various design decisions entail and propose future directions that could build off these methods to yield more robust, accurate RNA structure predictions.


Subject(s)
Epigenesis, Genetic , RNA , RNA/metabolism , Machine Learning , Protein Structure, Secondary , Computational Biology/methods
13.
Bioinformatics ; 40(Suppl 1): i521-i528, 2024 06 28.
Article in English | MEDLINE | ID: mdl-38940132

ABSTRACT

MOTIVATION: Spatially resolved single-cell transcriptomics have provided unprecedented insights into gene expression in situ, particularly in the context of cell interactions or organization of tissues. However, current technologies for profiling spatial gene expression at single-cell resolution are generally limited to the measurement of a small number of genes. To address this limitation, several algorithms have been developed to impute or predict the expression of additional genes that were not present in the measured gene panel. Current algorithms do not leverage the rich spatial and gene relational information in spatial transcriptomics. To improve spatial gene expression predictions, we introduce Spatial Propagation and Reinforcement of Imputed Transcript Expression (SPRITE) as a meta-algorithm that processes predictions obtained from existing methods by propagating information across gene correlation networks and spatial neighborhood graphs. RESULTS: SPRITE improves spatial gene expression predictions across multiple spatial transcriptomics datasets. Furthermore, SPRITE predicted spatial gene expression leads to improved clustering, visualization, and classification of cells. SPRITE can be used in spatial transcriptomics data analysis to improve inferences based on predicted gene expression. AVAILABILITY AND IMPLEMENTATION: The SPRITE software package is available at https://github.com/sunericd/SPRITE. Code for generating experiments and analyses in the manuscript is available at https://github.com/sunericd/sprite-figures-and-analyses.


Subject(s)
Algorithms , Gene Expression Profiling , Gene Regulatory Networks , Software , Gene Expression Profiling/methods , Single-Cell Analysis/methods , Humans , Transcriptome
14.
Bioinformatics ; 40(7)2024 07 01.
Article in English | MEDLINE | ID: mdl-38913862

ABSTRACT

MOTIVATION: The emergence of large chemical repositories and combinatorial chemical spaces, coupled with high-throughput docking and generative AI, have greatly expanded the chemical diversity of small molecules for drug discovery. Selecting compounds for experimental validation requires filtering these molecules based on favourable druglike properties, such as Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET). RESULTS: We developed ADMET-AI, a machine learning platform that provides fast and accurate ADMET predictions both as a website and as a Python package. ADMET-AI has the highest average rank on the TDC ADMET Leaderboard, and it is currently the fastest web-based ADMET predictor, with a 45% reduction in time compared to the next fastest public ADMET web server. ADMET-AI can also be run locally with predictions for one million molecules taking just 3.1 h. AVAILABILITY AND IMPLEMENTATION: The ADMET-AI platform is freely available both as a web server at admet.ai.greenstonebio.com and as an open-source Python package for local batch prediction at github.com/swansonk14/admet_ai (also archived on Zenodo at doi.org/10.5281/zenodo.10372930). All data and models are archived on Zenodo at doi.org/10.5281/zenodo.10372418.


Subject(s)
Drug Discovery , Machine Learning , Software , Drug Discovery/methods , Small Molecule Libraries/chemistry
15.
Nature ; 575(7781): 137-146, 2019 11.
Article in English | MEDLINE | ID: mdl-31695204

ABSTRACT

The goal of sex and gender analysis is to promote rigorous, reproducible and responsible science. Incorporating sex and gender analysis into experimental design has enabled advancements across many disciplines, such as improved treatment of heart disease and insights into the societal impact of algorithmic bias. Here we discuss the potential for sex and gender analysis to foster scientific discovery, improve experimental efficiency and enable social equality. We provide a roadmap for sex and gender analysis across scientific disciplines and call on researchers, funding agencies, peer-reviewed journals and universities to coordinate efforts to implement robust methods of sex and gender analysis.


Subject(s)
Engineering/methods , Engineering/standards , Research Design/standards , Research Design/trends , Science/methods , Science/standards , Sex Characteristics , Sex Factors , Animals , Artificial Intelligence , Female , Humans , Male , Molecular Targeted Therapy , Reproducibility of Results , Sample Size
16.
Ann Intern Med ; 177(2): 210-220, 2024 02.
Article in English | MEDLINE | ID: mdl-38285984

ABSTRACT

Large language models (LLMs) are artificial intelligence models trained on vast text data to generate humanlike outputs. They have been applied to various tasks in health care, ranging from answering medical examination questions to generating clinical reports. With increasing institutional partnerships between companies producing LLMs and health systems, the real-world clinical application of these models is nearing realization. As these models gain traction, health care practitioners must understand what LLMs are, their development, their current and potential applications, and the associated pitfalls in a medical setting. This review, coupled with a tutorial, provides a comprehensive yet accessible overview of these areas with the aim of familiarizing health care professionals with the rapidly changing landscape of LLMs in medicine. Furthermore, the authors highlight active research areas in the field that promise to improve LLMs' usability in health care contexts.


Subject(s)
Artificial Intelligence , Medicine , Humans , Health Personnel , Language
17.
Proc Natl Acad Sci U S A ; 118(15)2021 04 13.
Article in English | MEDLINE | ID: mdl-33827925

ABSTRACT

Simultaneous profiling of multiomic modalities within a single cell is a grand challenge for single-cell biology. While there have been impressive technical innovations demonstrating feasibility-for example, generating paired measurements of single-cell transcriptome (single-cell RNA sequencing [scRNA-seq]) and chromatin accessibility (single-cell assay for transposase-accessible chromatin using sequencing [scATAC-seq])-widespread application of joint profiling is challenging due to its experimental complexity, noise, and cost. Here, we introduce BABEL, a deep learning method that translates between the transcriptome and chromatin profiles of a single cell. Leveraging an interoperable neural network model, BABEL can predict single-cell expression directly from a cell's scATAC-seq and vice versa after training on relevant data. This makes it possible to computationally synthesize paired multiomic measurements when only one modality is experimentally available. Across several paired single-cell ATAC and gene expression datasets in human and mouse, we validate that BABEL accurately translates between these modalities for individual cells. BABEL also generalizes well to cell types within new biological contexts not seen during training. Starting from scATAC-seq of patient-derived basal cell carcinoma (BCC), BABEL generated single-cell expression that enabled fine-grained classification of complex cell states, despite having never seen BCC data. These predictions are comparable to analyses of experimental BCC scRNA-seq data for diverse cell types related to BABEL's training data. We further show that BABEL can incorporate additional single-cell data modalities, such as protein epitope profiling, thus enabling translation across chromatin, RNA, and protein. BABEL offers a powerful approach for data exploration and hypothesis generation.


Subject(s)
Carcinoma/genetics , Genomics/methods , Single-Cell Analysis/methods , Software , Animals , Carcinoma/metabolism , Deep Learning , Humans , Mice , Proteome/genetics , Proteome/metabolism , Transcriptome
18.
J Emerg Med ; 66(2): 184-191, 2024 02.
Article in English | MEDLINE | ID: mdl-38369413

ABSTRACT

BACKGROUND: The adoption of point-of-care ultrasound (POCUS) has greatly improved the ability to rapidly evaluate unstable emergency department (ED) patients at the bedside. One major use of POCUS is to obtain echocardiograms to assess cardiac function. OBJECTIVES: We developed EchoNet-POCUS, a novel deep learning system, to aid emergency physicians (EPs) in interpreting POCUS echocardiograms and to reduce operator-to-operator variability. METHODS: We collected a new dataset of POCUS echocardiogram videos obtained in the ED by EPs and annotated the cardiac function and quality of each video. Using this dataset, we train EchoNet-POCUS to evaluate both cardiac function and video quality in POCUS echocardiograms. RESULTS: EchoNet-POCUS achieves an area under the receiver operating characteristic curve (AUROC) of 0.92 (0.89-0.94) for predicting whether cardiac function is abnormal and an AUROC of 0.81 (0.78-0.85) for predicting video quality. CONCLUSIONS: EchoNet-POCUS can be applied to bedside echocardiogram videos in real time using commodity hardware, as we demonstrate in a prospective pilot study.


Subject(s)
Echocardiography , Point-of-Care Systems , Humans , Prospective Studies , Pilot Projects , Ultrasonography , Emergency Service, Hospital
19.
Am J Hum Genet ; 107(1): 72-82, 2020 07 02.
Article in English | MEDLINE | ID: mdl-32504544

ABSTRACT

Genetics researchers and clinical professionals rely on diversity measures such as race, ethnicity, and ancestry (REA) to stratify study participants and patients for a variety of applications in research and precision medicine. However, there are no comprehensive, widely accepted standards or guidelines for collecting and using such data in clinical genetics practice. Two NIH-funded research consortia, the Clinical Genome Resource (ClinGen) and Clinical Sequencing Evidence-generating Research (CSER), have partnered to address this issue and report how REA are currently collected, conceptualized, and used. Surveying clinical genetics professionals and researchers (n = 448), we found heterogeneity in the way REA are perceived, defined, and measured, with variation in the perceived importance of REA in both clinical and research settings. The majority of respondents (>55%) felt that REA are at least somewhat important for clinical variant interpretation, ordering genetic tests, and communicating results to patients. However, there was no consensus on the relevance of REA, including how each of these measures should be used in different scenarios and what information they can convey in the context of human genetics. A lack of common definitions and applications of REA across the precision medicine pipeline may contribute to inconsistencies in data collection, missing or inaccurate classifications, and misleading or inconclusive results. Thus, our findings support the need for standardization and harmonization of REA data collection and use in clinical genetics and precision health research.


Subject(s)
Data Collection/standards , Genetic Testing/standards , Adult , Child , Ethnicity , Female , Genetic Variation/genetics , Genomics/standards , Humans , Male , Precision Medicine/standards , Prohibitins , Surveys and Questionnaires
20.
Proc Natl Acad Sci U S A ; 117(41): 25464-25475, 2020 10 13.
Article in English | MEDLINE | ID: mdl-32973096

ABSTRACT

Proteolysis is a major posttranslational regulator of biology inside and outside of cells. Broad identification of optimal cleavage sites and natural substrates of proteases is critical for drug discovery and to understand protease biology. Here, we present a method that employs two genetically encoded substrate phage display libraries coupled with next generation sequencing (SPD-NGS) that allows up to 10,000-fold deeper sequence coverage of the typical six- to eight-residue protease cleavage sites compared to state-of-the-art synthetic peptide libraries or proteomics. We applied SPD-NGS to two classes of proteases, the intracellular caspases, and the ectodomains of the sheddases, ADAMs 10 and 17. The first library (Lib 10AA) allowed us to identify 104 to 105 unique cleavage sites over a 1,000-fold dynamic range of NGS counts and produced consensus and optimal cleavage motifs based position-specific scoring matrices. A second SPD-NGS library (Lib hP), which displayed virtually the entire human proteome tiled in contiguous 49 amino acid sequences with 25 amino acid overlaps, enabled us to identify candidate human proteome sequences. We identified up to 104 natural linear cut sites, depending on the protease, and captured most of the examples previously identified by proteomics and predicted 10- to 100-fold more. Structural bioinformatics was used to facilitate the identification of candidate natural protein substrates. SPD-NGS is rapid, reproducible, simple to perform and analyze, inexpensive, and renewable, with unprecedented depth of coverage for substrate sequences, and is an important tool for protease biologists interested in protease specificity for specific assays and inhibitors and to facilitate identification of natural protein substrates.


Subject(s)
Caspase 3/metabolism , Proteome , Caspase 3/genetics , Gene Expression Regulation, Enzymologic , Humans , Peptide Library , Substrate Specificity
SELECTION OF CITATIONS
SEARCH DETAIL