Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 140
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 186(8): 1772-1791, 2023 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-36905928

RESUMEN

Machine learning (ML) is increasingly used in clinical oncology to diagnose cancers, predict patient outcomes, and inform treatment planning. Here, we review recent applications of ML across the clinical oncology workflow. We review how these techniques are applied to medical imaging and to molecular data obtained from liquid and solid tumor biopsies for cancer diagnosis, prognosis, and treatment design. We discuss key considerations in developing ML for the distinct challenges posed by imaging and molecular data. Finally, we examine ML models approved for cancer-related patient usage by regulatory agencies and discuss approaches to improve the clinical usefulness of ML.


Asunto(s)
Aprendizaje Automático , Neoplasias , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , Neoplasias/terapia , Diagnóstico por Imagen , Oncología Médica
2.
Physiol Rev ; 103(4): 2423-2450, 2023 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-37104717

RESUMEN

Artificial intelligence in health care has experienced remarkable innovation and progress in the last decade. Significant advancements can be attributed to the utilization of artificial intelligence to transform physiology data to advance health care. In this review, we explore how past work has shaped the field and defined future challenges and directions. In particular, we focus on three areas of development. First, we give an overview of artificial intelligence, with special attention to the most relevant artificial intelligence models. We then detail how physiology data have been harnessed by artificial intelligence to advance the main areas of health care: automating existing health care tasks, increasing access to care, and augmenting health care capabilities. Finally, we discuss emerging concerns surrounding the use of individual physiology data and detail an increasingly important consideration for the field, namely the challenges of deploying artificial intelligence models to achieve meaningful clinical impact.


Asunto(s)
Inteligencia Artificial , Atención a la Salud , Humanos
3.
Nature ; 616(7957): 520-524, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-37020027

RESUMEN

Artificial intelligence (AI) has been developed for echocardiography1-3, although it has not yet been tested with blinding and randomization. Here we designed a blinded, randomized non-inferiority clinical trial (ClinicalTrials.gov ID: NCT05140642; no outside funding) of AI versus sonographer initial assessment of left ventricular ejection fraction (LVEF) to evaluate the impact of AI in the interpretation workflow. The primary end point was the change in the LVEF between initial AI or sonographer assessment and final cardiologist assessment, evaluated by the proportion of studies with substantial change (more than 5% change). From 3,769 echocardiographic studies screened, 274 studies were excluded owing to poor image quality. The proportion of studies substantially changed was 16.8% in the AI group and 27.2% in the sonographer group (difference of -10.4%, 95% confidence interval: -13.2% to -7.7%, P < 0.001 for non-inferiority, P < 0.001 for superiority). The mean absolute difference between final cardiologist assessment and independent previous cardiologist assessment was 6.29% in the AI group and 7.23% in the sonographer group (difference of -0.96%, 95% confidence interval: -1.34% to -0.54%, P < 0.001 for superiority). The AI-guided workflow saved time for both sonographers and cardiologists, and cardiologists were not able to distinguish between the initial assessments by AI versus the sonographer (blinding index of 0.088). For patients undergoing echocardiographic quantification of cardiac function, initial assessment of LVEF by AI was non-inferior to assessment by sonographers.


Asunto(s)
Inteligencia Artificial , Cardiólogos , Ecocardiografía , Pruebas de Función Cardíaca , Humanos , Inteligencia Artificial/normas , Ecocardiografía/métodos , Ecocardiografía/normas , Volumen Sistólico , Función Ventricular Izquierda , Método Simple Ciego , Flujo de Trabajo , Reproducibilidad de los Resultados , Pruebas de Función Cardíaca/métodos , Pruebas de Función Cardíaca/normas
4.
Cell ; 152(3): 642-54, 2013 Jan 31.
Artículo en Inglés | MEDLINE | ID: mdl-23333102

RESUMEN

Differences in chromatin organization are key to the multiplicity of cell states that arise from a single genetic background, yet the landscapes of in vivo tissues remain largely uncharted. Here, we mapped chromatin genome-wide in a large and diverse collection of human tissues and stem cells. The maps yield unprecedented annotations of functional genomic elements and their regulation across developmental stages, lineages, and cellular environments. They also reveal global features of the epigenome, related to nuclear architecture, that also vary across cellular phenotypes. Specifically, developmental specification is accompanied by progressive chromatin restriction as the default state transitions from dynamic remodeling to generalized compaction. Exposure to serum in vitro triggers a distinct transition that involves de novo establishment of domains with features of constitutive heterochromatin. We describe how these global chromatin state transitions relate to chromosome and nuclear architecture, and discuss their implications for lineage fidelity, cellular senescence, and reprogramming.


Asunto(s)
Ensamble y Desensamble de Cromatina , Cromatina/metabolismo , Epigénesis Genética , Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Núcleo Celular , Senescencia Celular , Células Madre Embrionarias/metabolismo , Regulación de la Expresión Génica , Humanos , Células Madre Pluripotentes Inducidas/metabolismo , Especificidad de Órganos
5.
Nat Methods ; 21(3): 444-454, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38347138

RESUMEN

Whole-transcriptome spatial profiling of genes at single-cell resolution remains a challenge. To address this limitation, spatial gene expression prediction methods have been developed to infer the spatial expression of unmeasured transcripts, but the quality of these predictions can vary greatly. Here we present Transcript Imputation with Spatial Single-cell Uncertainty Estimation (TISSUE) as a general framework for estimating uncertainty for spatial gene expression predictions and providing uncertainty-aware methods for downstream inference. Leveraging conformal inference, TISSUE provides well-calibrated prediction intervals for predicted expression values across 11 benchmark datasets. Moreover, it consistently reduces the false discovery rate for differential gene expression analysis, improves clustering and visualization of predicted spatial transcriptomics and improves the performance of supervised learning models trained on predicted gene expression profiles. Applying TISSUE to a MERFISH spatial transcriptomics dataset of the adult mouse subventricular zone, we identified subtypes within the neural stem cell lineage and developed subtype-specific regional classifiers.


Asunto(s)
Perfilación de la Expresión Génica , Células-Madre Neurales , Animales , Ratones , Incertidumbre , Benchmarking , Análisis por Conglomerados , Transcriptoma , Análisis de la Célula Individual
6.
Nature ; 592(7855): 629-633, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33828294

RESUMEN

There is a growing focus on making clinical trials more inclusive but the design of trial eligibility criteria remains challenging1-3. Here we systematically evaluate the effect of different eligibility criteria on cancer trial populations and outcomes with real-world data using the computational framework of Trial Pathfinder. We apply Trial Pathfinder to emulate completed trials of advanced non-small-cell lung cancer using data from a nationwide database of electronic health records comprising 61,094 patients with advanced non-small-cell lung cancer. Our analyses reveal that many common criteria, including exclusions based on several laboratory values, had a minimal effect on the trial hazard ratios. When we used a data-driven approach to broaden restrictive criteria, the pool of eligible patients more than doubled on average and the hazard ratio of the overall survival decreased by an average of 0.05. This suggests that many patients who were not eligible under the original trial criteria could potentially benefit from the treatments. We further support our findings through analyses of other types of cancer and patient-safety data from diverse clinical trials. Our data-driven methodology for evaluating eligibility criteria can facilitate the design of more-inclusive trials while maintaining safeguards for patient safety.


Asunto(s)
Inteligencia Artificial , Ensayos Clínicos como Asunto/métodos , Conjuntos de Datos como Asunto , Oncología Médica , Seguridad del Paciente , Selección de Paciente , Carcinoma de Pulmón de Células no Pequeñas/tratamiento farmacológico , Técnicas de Laboratorio Clínico , Registros Electrónicos de Salud/estadística & datos numéricos , Humanos , Neoplasias Pulmonares/tratamiento farmacológico , Seguridad del Paciente/normas , Selección de Paciente/ética , Modelos de Riesgos Proporcionales , Reproducibilidad de los Resultados
7.
Proc Natl Acad Sci U S A ; 121(10): e2313719121, 2024 Mar 05.
Artículo en Inglés | MEDLINE | ID: mdl-38416677

RESUMEN

Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional single-cell datasets are alignable (and therefore should even be aligned). Moreover, popular methods can substantially distort the data during alignment, making the aligned data and downstream analysis difficult to interpret. To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data with the same type of features. SMAI provides a statistical test to robustly assess the alignability between datasets to avoid misleading inference and is justified by high-dimensional statistical theory. On a diverse range of real and simulated benchmark datasets, it outperforms commonly used alignment methods. Moreover, we show that SMAI improves various downstream analyses such as identification of differentially expressed genes and imputation of single-cell spatial transcriptomics, providing further biological insights. SMAI's interpretability also enables quantification and a deeper understanding of the sources of technical confounders in single-cell data.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Expresión Génica , Análisis de la Célula Individual
8.
Nature ; 580(7802): 252-256, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-32269341

RESUMEN

Accurate assessment of cardiac function is crucial for the diagnosis of cardiovascular disease1, screening for cardiotoxicity2 and decisions regarding the clinical management of patients with a critical illness3. However, human assessment of cardiac function focuses on a limited sampling of cardiac cycles and has considerable inter-observer variability despite years of training4,5. Here, to overcome this challenge, we present a video-based deep learning algorithm-EchoNet-Dynamic-that surpasses the performance of human experts in the critical tasks of segmenting the left ventricle, estimating ejection fraction and assessing cardiomyopathy. Trained on echocardiogram videos, our model accurately segments the left ventricle with a Dice similarity coefficient of 0.92, predicts ejection fraction with a mean absolute error of 4.1% and reliably classifies heart failure with reduced ejection fraction (area under the curve of 0.97). In an external dataset from another healthcare system, EchoNet-Dynamic predicts the ejection fraction with a mean absolute error of 6.0% and classifies heart failure with reduced ejection fraction with an area under the curve of 0.96. Prospective evaluation with repeated human measurements confirms that the model has variance that is comparable to or less than that of human experts. By leveraging information across multiple cardiac cycles, our model can rapidly identify subtle changes in ejection fraction, is more reproducible than human evaluation and lays the foundation for precise diagnosis of cardiovascular disease in real time. As a resource to promote further innovation, we also make publicly available a large dataset of 10,030 annotated echocardiogram videos.


Asunto(s)
Aprendizaje Profundo , Cardiopatías/diagnóstico , Cardiopatías/fisiopatología , Corazón/fisiología , Corazón/fisiopatología , Modelos Cardiovasculares , Grabación en Video , Fibrilación Atrial , Conjuntos de Datos como Asunto , Ecocardiografía , Insuficiencia Cardíaca/fisiopatología , Hospitales , Humanos , Estudios Prospectivos , Reproducibilidad de los Resultados , Función Ventricular Izquierda/fisiología
9.
Genome Res ; 32(5): 968-985, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35332099

RESUMEN

The recent development and application of methods based on the general principle of "crosslinking and proximity ligation" (crosslink-ligation) are revolutionizing RNA structure studies in living cells. However, extracting structure information from such data presents unique challenges. Here, we introduce a set of computational tools for the systematic analysis of data from a wide variety of crosslink-ligation methods, specifically focusing on read mapping, alignment classification, and clustering. We design a new strategy to map short reads with irregular gaps at high sensitivity and specificity. Analysis of previously published data reveals distinct properties and bias caused by the crosslinking reactions. We perform rigorous and exhaustive classification of alignments and discover eight types of arrangements that provide distinct information on RNA structures and interactions. To deconvolve the dense and intertwined gapped alignments, we develop a network/graph-based tool Crosslinked RNA Secondary Structure Analysis using Network Techniques (CRSSANT), which enables clustering of gapped alignments and discovery of new alternative and dynamic conformations. We discover that multiple crosslinking and ligation events can occur on the same RNA, generating multisegment alignments to report complex high-level RNA structures and multi-RNA interactions. We find that alignments with overlapped segments are produced from potential homodimers and develop a new method for their de novo identification. Analysis of overlapping alignments revealed potential new homodimers in cellular noncoding RNAs and RNA virus genomes in the Picornaviridae family. Together, this suite of computational tools enables rapid and efficient analysis of RNA structure and interaction data in living cells.


Asunto(s)
ARN no Traducido , ARN , Algoritmos , Análisis por Conglomerados , ARN/química , ARN/genética , ARN no Traducido/química , Análisis de Secuencia de ARN/métodos , Programas Informáticos
10.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37280185

RESUMEN

The three-dimensional structure of RNA molecules plays a critical role in a wide range of cellular processes encompassing functions from riboswitches to epigenetic regulation. These RNA structures are incredibly dynamic and can indeed be described aptly as an ensemble of structures that shifts in distribution depending on different cellular conditions. Thus, the computational prediction of RNA structure poses a unique challenge, even as computational protein folding has seen great advances. In this review, we focus on a variety of machine learning-based methods that have been developed to predict RNA molecules' secondary structure, as well as more complex tertiary structures. We survey commonly used modeling strategies, and how many are inspired by or incorporate thermodynamic principles. We discuss the shortcomings that various design decisions entail and propose future directions that could build off these methods to yield more robust, accurate RNA structure predictions.


Asunto(s)
Epigénesis Genética , ARN , ARN/metabolismo , Aprendizaje Automático , Estructura Secundaria de Proteína , Biología Computacional/métodos
11.
Bioinformatics ; 40(Supplement_1): i521-i528, 2024 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-38940132

RESUMEN

MOTIVATION: Spatially resolved single-cell transcriptomics have provided unprecedented insights into gene expression in situ, particularly in the context of cell interactions or organization of tissues. However, current technologies for profiling spatial gene expression at single-cell resolution are generally limited to the measurement of a small number of genes. To address this limitation, several algorithms have been developed to impute or predict the expression of additional genes that were not present in the measured gene panel. Current algorithms do not leverage the rich spatial and gene relational information in spatial transcriptomics. To improve spatial gene expression predictions, we introduce Spatial Propagation and Reinforcement of Imputed Transcript Expression (SPRITE) as a meta-algorithm that processes predictions obtained from existing methods by propagating information across gene correlation networks and spatial neighborhood graphs. RESULTS: SPRITE improves spatial gene expression predictions across multiple spatial transcriptomics datasets. Furthermore, SPRITE predicted spatial gene expression leads to improved clustering, visualization, and classification of cells. SPRITE can be used in spatial transcriptomics data analysis to improve inferences based on predicted gene expression. AVAILABILITY AND IMPLEMENTATION: The SPRITE software package is available at https://github.com/sunericd/SPRITE. Code for generating experiments and analyses in the manuscript is available at https://github.com/sunericd/sprite-figures-and-analyses.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Programas Informáticos , Perfilación de la Expresión Génica/métodos , Análisis de la Célula Individual/métodos , Humanos , Transcriptoma
12.
Bioinformatics ; 40(7)2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-38913862

RESUMEN

MOTIVATION: The emergence of large chemical repositories and combinatorial chemical spaces, coupled with high-throughput docking and generative AI, have greatly expanded the chemical diversity of small molecules for drug discovery. Selecting compounds for experimental validation requires filtering these molecules based on favourable druglike properties, such as Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET). RESULTS: We developed ADMET-AI, a machine learning platform that provides fast and accurate ADMET predictions both as a website and as a Python package. ADMET-AI has the highest average rank on the TDC ADMET Leaderboard, and it is currently the fastest web-based ADMET predictor, with a 45% reduction in time compared to the next fastest public ADMET web server. ADMET-AI can also be run locally with predictions for one million molecules taking just 3.1 h. AVAILABILITY AND IMPLEMENTATION: The ADMET-AI platform is freely available both as a web server at admet.ai.greenstonebio.com and as an open-source Python package for local batch prediction at github.com/swansonk14/admet_ai (also archived on Zenodo at doi.org/10.5281/zenodo.10372930). All data and models are archived on Zenodo at doi.org/10.5281/zenodo.10372418.


Asunto(s)
Descubrimiento de Drogas , Aprendizaje Automático , Programas Informáticos , Descubrimiento de Drogas/métodos , Bibliotecas de Moléculas Pequeñas/química
13.
Nature ; 575(7781): 137-146, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-31695204

RESUMEN

The goal of sex and gender analysis is to promote rigorous, reproducible and responsible science. Incorporating sex and gender analysis into experimental design has enabled advancements across many disciplines, such as improved treatment of heart disease and insights into the societal impact of algorithmic bias. Here we discuss the potential for sex and gender analysis to foster scientific discovery, improve experimental efficiency and enable social equality. We provide a roadmap for sex and gender analysis across scientific disciplines and call on researchers, funding agencies, peer-reviewed journals and universities to coordinate efforts to implement robust methods of sex and gender analysis.


Asunto(s)
Ingeniería/métodos , Ingeniería/normas , Proyectos de Investigación/normas , Proyectos de Investigación/tendencias , Ciencia/métodos , Ciencia/normas , Caracteres Sexuales , Factores Sexuales , Animales , Inteligencia Artificial , Femenino , Humanos , Masculino , Terapia Molecular Dirigida , Reproducibilidad de los Resultados , Tamaño de la Muestra
14.
Ann Intern Med ; 177(2): 210-220, 2024 02.
Artículo en Inglés | MEDLINE | ID: mdl-38285984

RESUMEN

Large language models (LLMs) are artificial intelligence models trained on vast text data to generate humanlike outputs. They have been applied to various tasks in health care, ranging from answering medical examination questions to generating clinical reports. With increasing institutional partnerships between companies producing LLMs and health systems, the real-world clinical application of these models is nearing realization. As these models gain traction, health care practitioners must understand what LLMs are, their development, their current and potential applications, and the associated pitfalls in a medical setting. This review, coupled with a tutorial, provides a comprehensive yet accessible overview of these areas with the aim of familiarizing health care professionals with the rapidly changing landscape of LLMs in medicine. Furthermore, the authors highlight active research areas in the field that promise to improve LLMs' usability in health care contexts.


Asunto(s)
Inteligencia Artificial , Medicina , Humanos , Personal de Salud , Lenguaje
15.
Proc Natl Acad Sci U S A ; 118(15)2021 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-33827925

RESUMEN

Simultaneous profiling of multiomic modalities within a single cell is a grand challenge for single-cell biology. While there have been impressive technical innovations demonstrating feasibility-for example, generating paired measurements of single-cell transcriptome (single-cell RNA sequencing [scRNA-seq]) and chromatin accessibility (single-cell assay for transposase-accessible chromatin using sequencing [scATAC-seq])-widespread application of joint profiling is challenging due to its experimental complexity, noise, and cost. Here, we introduce BABEL, a deep learning method that translates between the transcriptome and chromatin profiles of a single cell. Leveraging an interoperable neural network model, BABEL can predict single-cell expression directly from a cell's scATAC-seq and vice versa after training on relevant data. This makes it possible to computationally synthesize paired multiomic measurements when only one modality is experimentally available. Across several paired single-cell ATAC and gene expression datasets in human and mouse, we validate that BABEL accurately translates between these modalities for individual cells. BABEL also generalizes well to cell types within new biological contexts not seen during training. Starting from scATAC-seq of patient-derived basal cell carcinoma (BCC), BABEL generated single-cell expression that enabled fine-grained classification of complex cell states, despite having never seen BCC data. These predictions are comparable to analyses of experimental BCC scRNA-seq data for diverse cell types related to BABEL's training data. We further show that BABEL can incorporate additional single-cell data modalities, such as protein epitope profiling, thus enabling translation across chromatin, RNA, and protein. BABEL offers a powerful approach for data exploration and hypothesis generation.


Asunto(s)
Carcinoma/genética , Genómica/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos , Animales , Carcinoma/metabolismo , Aprendizaje Profundo , Humanos , Ratones , Proteoma/genética , Proteoma/metabolismo , Transcriptoma
16.
J Emerg Med ; 66(2): 184-191, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38369413

RESUMEN

BACKGROUND: The adoption of point-of-care ultrasound (POCUS) has greatly improved the ability to rapidly evaluate unstable emergency department (ED) patients at the bedside. One major use of POCUS is to obtain echocardiograms to assess cardiac function. OBJECTIVES: We developed EchoNet-POCUS, a novel deep learning system, to aid emergency physicians (EPs) in interpreting POCUS echocardiograms and to reduce operator-to-operator variability. METHODS: We collected a new dataset of POCUS echocardiogram videos obtained in the ED by EPs and annotated the cardiac function and quality of each video. Using this dataset, we train EchoNet-POCUS to evaluate both cardiac function and video quality in POCUS echocardiograms. RESULTS: EchoNet-POCUS achieves an area under the receiver operating characteristic curve (AUROC) of 0.92 (0.89-0.94) for predicting whether cardiac function is abnormal and an AUROC of 0.81 (0.78-0.85) for predicting video quality. CONCLUSIONS: EchoNet-POCUS can be applied to bedside echocardiogram videos in real time using commodity hardware, as we demonstrate in a prospective pilot study.


Asunto(s)
Ecocardiografía , Sistemas de Atención de Punto , Humanos , Estudios Prospectivos , Proyectos Piloto , Ultrasonografía , Servicio de Urgencia en Hospital
17.
Am J Hum Genet ; 107(1): 72-82, 2020 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-32504544

RESUMEN

Genetics researchers and clinical professionals rely on diversity measures such as race, ethnicity, and ancestry (REA) to stratify study participants and patients for a variety of applications in research and precision medicine. However, there are no comprehensive, widely accepted standards or guidelines for collecting and using such data in clinical genetics practice. Two NIH-funded research consortia, the Clinical Genome Resource (ClinGen) and Clinical Sequencing Evidence-generating Research (CSER), have partnered to address this issue and report how REA are currently collected, conceptualized, and used. Surveying clinical genetics professionals and researchers (n = 448), we found heterogeneity in the way REA are perceived, defined, and measured, with variation in the perceived importance of REA in both clinical and research settings. The majority of respondents (>55%) felt that REA are at least somewhat important for clinical variant interpretation, ordering genetic tests, and communicating results to patients. However, there was no consensus on the relevance of REA, including how each of these measures should be used in different scenarios and what information they can convey in the context of human genetics. A lack of common definitions and applications of REA across the precision medicine pipeline may contribute to inconsistencies in data collection, missing or inaccurate classifications, and misleading or inconclusive results. Thus, our findings support the need for standardization and harmonization of REA data collection and use in clinical genetics and precision health research.


Asunto(s)
Recolección de Datos/normas , Pruebas Genéticas/normas , Adulto , Niño , Etnicidad , Femenino , Variación Genética/genética , Genómica/normas , Humanos , Masculino , Medicina de Precisión/normas , Prohibitinas , Encuestas y Cuestionarios
18.
Proc Natl Acad Sci U S A ; 117(41): 25464-25475, 2020 10 13.
Artículo en Inglés | MEDLINE | ID: mdl-32973096

RESUMEN

Proteolysis is a major posttranslational regulator of biology inside and outside of cells. Broad identification of optimal cleavage sites and natural substrates of proteases is critical for drug discovery and to understand protease biology. Here, we present a method that employs two genetically encoded substrate phage display libraries coupled with next generation sequencing (SPD-NGS) that allows up to 10,000-fold deeper sequence coverage of the typical six- to eight-residue protease cleavage sites compared to state-of-the-art synthetic peptide libraries or proteomics. We applied SPD-NGS to two classes of proteases, the intracellular caspases, and the ectodomains of the sheddases, ADAMs 10 and 17. The first library (Lib 10AA) allowed us to identify 104 to 105 unique cleavage sites over a 1,000-fold dynamic range of NGS counts and produced consensus and optimal cleavage motifs based position-specific scoring matrices. A second SPD-NGS library (Lib hP), which displayed virtually the entire human proteome tiled in contiguous 49 amino acid sequences with 25 amino acid overlaps, enabled us to identify candidate human proteome sequences. We identified up to 104 natural linear cut sites, depending on the protease, and captured most of the examples previously identified by proteomics and predicted 10- to 100-fold more. Structural bioinformatics was used to facilitate the identification of candidate natural protein substrates. SPD-NGS is rapid, reproducible, simple to perform and analyze, inexpensive, and renewable, with unprecedented depth of coverage for substrate sequences, and is an important tool for protease biologists interested in protease specificity for specific assays and inhibitors and to facilitate identification of natural protein substrates.


Asunto(s)
Caspasa 3/metabolismo , Proteoma , Caspasa 3/genética , Regulación Enzimológica de la Expresión Génica , Humanos , Biblioteca de Péptidos , Especificidad por Sustrato
19.
RNA ; 26(7): 851-865, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32220894

RESUMEN

Subcellular localization is essential to RNA biogenesis, processing, and function across the gene expression life cycle. However, the specific nucleotide sequence motifs that direct RNA localization are incompletely understood. Fortunately, new sequencing technologies have provided transcriptome-wide atlases of RNA localization, creating an opportunity to leverage computational modeling. Here we present RNA-GPS, a new machine learning model that uses nucleotide-level features to predict RNA localization across eight different subcellular locations-the first to provide such a wide range of predictions. RNA-GPS's design enables high-throughput sequence ablation and feature importance analyses to probe the sequence motifs that drive localization prediction. We find localization informative motifs to be concentrated on 3'-UTRs and scattered along the coding sequence, and motifs related to splicing to be important drivers of predicted localization, even for cytotopic distinctions for membraneless bodies within the nucleus or for organelles within the cytoplasm. Overall, our results suggest transcript splicing is one of many elements influencing RNA subcellular localization.


Asunto(s)
Empalme Alternativo/genética , ARN/genética , Regiones no Traducidas 3'/genética , Línea Celular Tumoral , Núcleo Celular/genética , Biología Computacional/métodos , Citoplasma/genética , Células HeLa , Humanos , Células K562 , Análisis de Secuencia de ARN/métodos , Transcriptoma/genética
20.
Nature ; 536(7616): 285-91, 2016 08 18.
Artículo en Inglés | MEDLINE | ID: mdl-27535533

RESUMEN

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.


Asunto(s)
Exoma/genética , Variación Genética/genética , Análisis Mutacional de ADN , Conjuntos de Datos como Asunto , Humanos , Fenotipo , Proteoma/genética , Enfermedades Raras/genética , Tamaño de la Muestra
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA