Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.741
Filtrar
Más filtros

Intervalo de año de publicación
1.
Nature ; 630(8016): 493-500, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38718835

RESUMEN

The introduction of AlphaFold 21 has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design2-6. Here we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues. The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein-ligand interactions compared with state-of-the-art docking tools, much higher accuracy for protein-nucleic acid interactions compared with nucleic-acid-specific predictors and substantially higher antibody-antigen prediction accuracy compared with AlphaFold-Multimer v.2.37,8. Together, these results show that high-accuracy modelling across biomolecular space is possible within a single unified deep-learning framework.


Asunto(s)
Aprendizaje Profundo , Ligandos , Modelos Moleculares , Proteínas , Programas Informáticos , Humanos , Anticuerpos/química , Anticuerpos/metabolismo , Antígenos/metabolismo , Antígenos/química , Aprendizaje Profundo/normas , Iones/química , Iones/metabolismo , Simulación del Acoplamiento Molecular , Ácidos Nucleicos/química , Ácidos Nucleicos/metabolismo , Unión Proteica , Conformación Proteica , Proteínas/química , Proteínas/metabolismo , Reproducibilidad de los Resultados , Programas Informáticos/normas
2.
Nature ; 587(7833): 246-251, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33177663

RESUMEN

New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1-3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.


Asunto(s)
Genoma/genética , Genómica/métodos , Alineación de Secuencia/métodos , Programas Informáticos , Vertebrados/genética , Amnios , Animales , Simulación por Computador , Genómica/normas , Haplotipos , Humanos , Control de Calidad , Alineación de Secuencia/normas , Programas Informáticos/normas
3.
Nature ; 580(7805): 663-668, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-32152607

RESUMEN

On average, an approved drug currently costs US$2-3 billion and takes more than 10 years to develop1. In part, this is due to expensive and time-consuming wet-laboratory experiments, poor initial hit compounds and the high attrition rates in the (pre-)clinical phases. Structure-based virtual screening has the potential to mitigate these problems. With structure-based virtual screening, the quality of the hits improves with the number of compounds screened2. However, despite the fact that large databases of compounds exist, the ability to carry out large-scale structure-based virtual screening on computer clusters in an accessible, efficient and flexible manner has remained difficult. Here we describe VirtualFlow, a highly automated and versatile open-source platform with perfect scaling behaviour that is able to prepare and efficiently screen ultra-large libraries of compounds. VirtualFlow is able to use a variety of the most powerful docking programs. Using VirtualFlow, we prepared one of the largest and freely available ready-to-dock ligand libraries, with more than 1.4 billion commercially available molecules. To demonstrate the power of VirtualFlow, we screened more than 1 billion compounds and identified a set of structurally diverse molecules that bind to KEAP1 with submicromolar affinity. One of the lead inhibitors (iKeap1) engages KEAP1 with nanomolar affinity (dissociation constant (Kd) = 114 nM) and disrupts the interaction between KEAP1 and the transcription factor NRF2. This illustrates the potential of VirtualFlow to access vast regions of the chemical space and identify molecules that bind with high affinity to target proteins.


Asunto(s)
Descubrimiento de Drogas/métodos , Evaluación Preclínica de Medicamentos/métodos , Simulación del Acoplamiento Molecular/métodos , Programas Informáticos , Interfaz Usuario-Computador , Acceso a la Información , Automatización/métodos , Automatización/normas , Nube Computacional , Simulación por Computador , Bases de Datos de Compuestos Químicos , Descubrimiento de Drogas/normas , Evaluación Preclínica de Medicamentos/normas , Proteína 1 Asociada A ECH Tipo Kelch/antagonistas & inhibidores , Proteína 1 Asociada A ECH Tipo Kelch/química , Proteína 1 Asociada A ECH Tipo Kelch/metabolismo , Ligandos , Simulación del Acoplamiento Molecular/normas , Terapia Molecular Dirigida , Factor 2 Relacionado con NF-E2/metabolismo , Reproducibilidad de los Resultados , Programas Informáticos/normas , Termodinámica
4.
Nature ; 588(7836): 83-88, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33049755

RESUMEN

Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years1-7. However, the field has progressed greatly since the development of early programs such as LHASA1,7, for which reaction choices at each step were made by human operators. Multiple software platforms6,8-14 are now capable of completely autonomous planning. But these programs 'think' only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary15,16 and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program's knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships17,18, allowing it to 'strategize' over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization.


Asunto(s)
Inteligencia Artificial , Productos Biológicos/síntesis química , Técnicas de Química Sintética/métodos , Química Orgánica/métodos , Programas Informáticos , Inteligencia Artificial/normas , Automatización/métodos , Automatización/normas , Bencilisoquinolinas/síntesis química , Bencilisoquinolinas/química , Técnicas de Química Sintética/normas , Química Orgánica/normas , Indanos/síntesis química , Indanos/química , Alcaloides Indólicos/síntesis química , Alcaloides Indólicos/química , Bases del Conocimiento , Lactonas/síntesis química , Lactonas/química , Macrólidos/síntesis química , Macrólidos/química , Reproducibilidad de los Resultados , Sesquiterpenos/síntesis química , Sesquiterpenos/química , Programas Informáticos/normas , Tetrahidroisoquinolinas/síntesis química , Tetrahidroisoquinolinas/química
5.
Nucleic Acids Res ; 52(6): 2821-2835, 2024 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-38348970

RESUMEN

A key attribute of some long noncoding RNAs (lncRNAs) is their ability to regulate expression of neighbouring genes in cis. However, such 'cis-lncRNAs' are presently defined using ad hoc criteria that, we show, are prone to false-positive predictions. The resulting lack of cis-lncRNA catalogues hinders our understanding of their extent, characteristics and mechanisms. Here, we introduce TransCistor, a framework for defining and identifying cis-lncRNAs based on enrichment of targets amongst proximal genes. TransCistor's simple and conservative statistical models are compatible with functionally defined target gene maps generated by existing and future technologies. Using transcriptome-wide perturbation experiments for 268 human and 134 mouse lncRNAs, we provide the first large-scale survey of cis-lncRNAs. Known cis-lncRNAs are correctly identified, including XIST, LINC00240 and UMLILO, and predictions are consistent across analysis methods, perturbation types and independent experiments. We detect cis-activity in a minority of lncRNAs, primarily involving activators over repressors. Cis-lncRNAs are detected by both RNA interference and antisense oligonucleotide perturbations. Mechanistically, cis-lncRNA transcripts are observed to physically associate with their target genes and are weakly enriched with enhancer elements. In summary, TransCistor establishes a quantitative foundation for cis-lncRNAs, opening a path to elucidating their molecular mechanisms and biological significance.


Asunto(s)
Biología Computacional , Técnicas Genéticas , ARN Largo no Codificante , Animales , Humanos , Ratones , ARN Largo no Codificante/genética , ARN Largo no Codificante/aislamiento & purificación , Factores de Transcripción/genética , Transcriptoma , Programas Informáticos/normas , Biología Computacional/métodos
6.
Nucleic Acids Res ; 52(6): 2836-2847, 2024 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-38412249

RESUMEN

The field of synthetic nucleic acids with novel backbone structures [xenobiotic nucleic acids (XNAs)] has flourished due to the increased importance of XNA antisense oligonucleotides and aptamers in medicine, as well as the development of XNA processing enzymes and new XNA genetic materials. Molecular modeling on XNA structures can accelerate rational design in the field of XNAs as it contributes in understanding and predicting how changes in the sugar-phosphate backbone impact on the complementation properties of the nucleic acids. To support the development of novel XNA polymers, we present a first-in-class open-source program (Ducque) to build duplexes of nucleic acid analogs with customizable chemistry. A detailed procedure is described to extend the Ducque library with new user-defined XNA fragments using quantum mechanics (QM) and to generate QM-based force field parameters for molecular dynamics simulations within standard packages such as AMBER. The tool was used within a molecular modeling workflow to accurately reproduce a selection of experimental structures for nucleic acid duplexes with ribose-based as well as non-ribose-based nucleosides. Additionally, it was challenged to build duplexes of morpholino nucleic acids bound to complementary RNA sequences.


Asunto(s)
Simulación de Dinámica Molecular , Morfolinos , Ácidos Nucleicos , ARN , Programas Informáticos , Morfolinos/química , Conformación de Ácido Nucleico , Ácidos Nucleicos/química , Oligonucleótidos/química , ARN/química , Programas Informáticos/normas
7.
Nucleic Acids Res ; 52(6): e31, 2024 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-38364867

RESUMEN

Proteins are crucial in regulating every aspect of RNA life, yet understanding their interactions with coding and noncoding RNAs remains limited. Experimental studies are typically restricted to a small number of cell lines and a limited set of RNA-binding proteins (RBPs). Although computational methods based on physico-chemical principles can predict protein-RNA interactions accurately, they often lack the ability to consider cell-type-specific gene expression and the broader context of gene regulatory networks (GRNs). Here, we assess the performance of several GRN inference algorithms in predicting protein-RNA interactions from single-cell transcriptomic data, and propose a pipeline, called scRAPID (single-cell transcriptomic-based RnA Protein Interaction Detection), that integrates these methods with the catRAPID algorithm, which can identify direct physical interactions between RBPs and RNA molecules. Our approach demonstrates that RBP-RNA interactions can be predicted from single-cell transcriptomic data, with performances comparable or superior to those achieved for the well-established task of inferring transcription factor-target interactions. The incorporation of catRAPID significantly enhances the accuracy of identifying interactions, particularly with long noncoding RNAs, and enables the identification of hub RBPs and RNAs. Additionally, we show that interactions between RBPs can be detected based on their inferred RNA targets. The software is freely available at https://github.com/tartaglialabIIT/scRAPID.


Asunto(s)
Proteínas de Unión al ARN , ARN , Análisis de Expresión Génica de una Sola Célula , Programas Informáticos , Algoritmos , ARN/genética , ARN/metabolismo , Proteínas de Unión al ARN/metabolismo , Programas Informáticos/normas , Redes Reguladoras de Genes , Humanos , Línea Celular
8.
Plant Physiol ; 195(1): 378-394, 2024 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-38298139

RESUMEN

Automated guard cell detection and measurement are vital for understanding plant physiological performance and ecological functioning in global water and carbon cycles. Most current methods for measuring guard cells and stomata are laborious, time-consuming, prone to bias, and limited in scale. We developed StoManager1, a high-throughput tool utilizing geometrical, mathematical algorithms, and convolutional neural networks to automatically detect, count, and measure over 30 guard cell and stomatal metrics, including guard cell and stomatal area, length, width, stomatal aperture area/guard cell area, orientation, stomatal evenness, divergence, and aggregation index. Combined with leaf functional traits, some of these StoManager1-measured guard cell and stomatal metrics explained 90% and 82% of tree biomass and intrinsic water use efficiency (iWUE) variances in hardwoods, making them substantial factors in leaf physiology and tree growth. StoManager1 demonstrated exceptional precision and recall (mAP@0.5 over 0.96), effectively capturing diverse stomatal properties across over 100 species. StoManager1 facilitates the automation of measuring leaf stomatal and guard cells, enabling broader exploration of stomatal control in plant growth and adaptation to environmental stress and climate change. This has implications for global gross primary productivity (GPP) modeling and estimation, as integrating stomatal metrics can enhance predictions of plant growth and resource usage worldwide. Easily accessible open-source code and standalone Windows executable applications are available on a GitHub repository (https://github.com/JiaxinWang123/StoManager1) and Zenodo (https://doi.org/10.5281/zenodo.7686022).


Asunto(s)
Botánica , Biología Celular , Células Vegetales , Estomas de Plantas , Programas Informáticos , Estomas de Plantas/citología , Estomas de Plantas/crecimiento & desarrollo , Células Vegetales/fisiología , Botánica/instrumentación , Botánica/métodos , Biología Celular/instrumentación , Procesamiento de Imagen Asistido por Computador/normas , Algoritmos , Hojas de la Planta/citología , Redes Neurales de la Computación , Ensayos Analíticos de Alto Rendimiento/instrumentación , Ensayos Analíticos de Alto Rendimiento/métodos , Ensayos Analíticos de Alto Rendimiento/normas , Programas Informáticos/normas
13.
Brief Bioinform ; 22(1): 109-126, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-31813964

RESUMEN

MOTIVATION: Biological systems function through dynamic interactions among genes and their products, regulatory circuits and metabolic networks. Our development of the Pathway Tools software was motivated by the need to construct biological knowledge resources that combine these many types of data, and that enable users to find and comprehend data of interest as quickly as possible through query and visualization tools. Further, we sought to support the development of metabolic flux models from pathway databases, and to use pathway information to leverage the interpretation of high-throughput data sets. RESULTS: In the past 4 years we have enhanced the already extensive Pathway Tools software in several respects. It can now support metabolic-model execution through the Web, it provides a more accurate gap filler for metabolic models; it supports development of models for organism communities distributed across a spatial grid; and model results may be visualized graphically. Pathway Tools supports several new omics-data analysis tools including the Omics Dashboard, multi-pathway diagrams called pathway collages, a pathway-covering algorithm for metabolomics data analysis and an algorithm for generating mechanistic explanations of multi-omics data. We have also improved the core pathway/genome databases management capabilities of the software, providing new multi-organism search tools for organism communities, improved graphics rendering, faster performance and re-designed gene and metabolite pages. AVAILABILITY: The software is free for academic use; a fee is required for commercial use. See http://pathwaytools.com. CONTACT: pkarp@ai.sri.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Briefings in Bioinformatics online.


Asunto(s)
Genómica/métodos , Metabolómica/métodos , Programas Informáticos/normas , Biología de Sistemas/métodos , Animales , Humanos
14.
Brief Bioinform ; 22(1): 146-163, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-31838514

RESUMEN

MOTIVATION: Annotation tools are applied to build training and test corpora, which are essential for the development and evaluation of new natural language processing algorithms. Further, annotation tools are also used to extract new information for a particular use case. However, owing to the high number of existing annotation tools, finding the one that best fits particular needs is a demanding task that requires searching the scientific literature followed by installing and trying various tools. METHODS: We searched for annotation tools and selected a subset of them according to five requirements with which they should comply, such as being Web-based or supporting the definition of a schema. We installed the selected tools (when necessary), carried out hands-on experiments and evaluated them using 26 criteria that covered functional and technical aspects. We defined each criterion on three levels of matches and a score for the final evaluation of the tools. RESULTS: We evaluated 78 tools and selected the following 15 for a detailed evaluation: BioQRator, brat, Catma, Djangology, ezTag, FLAT, LightTag, MAT, MyMiner, PDFAnno, prodigy, tagtog, TextAE, WAT-SL and WebAnno. Full compliance with our 26 criteria ranged from only 9 up to 20 criteria, which demonstrated that some tools are comprehensive and mature enough to be used on most annotation projects. The highest score of 0.81 was obtained by WebAnno (of a maximum value of 1.0).


Asunto(s)
Biología Computacional/normas , Curaduría de Datos/normas , Biología Computacional/métodos , Curaduría de Datos/métodos , Programas Informáticos/normas
15.
Brief Bioinform ; 22(1): 557-567, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-32031567

RESUMEN

Microbiome samples are accumulating at an unprecedented speed. As a result, a massive amount of samples have become available for the mining of the intrinsic patterns among them. However, due to the lack of advanced computational tools, fast yet accurate comparisons and searches among thousands to millions of samples are still in urgent need. In this work, we proposed the Meta-Prism method for comparing and searching the microbial community structures amongst tens of thousands of samples. Meta-Prism is at least 10 times faster than contemporary methods serving the same purpose and can provide very accurate search results. The method is based on three computational techniques: dual-indexing approach for sample subgrouping, refined scoring function that could scrutinize the minute differences among samples, and parallel computation on CPU or GPU. The superiority of Meta-Prism on speed and accuracy for multiple sample searches is proven based on searching against ten thousand samples derived from both human and environments. Therefore, Meta-Prism could facilitate similarity search and in-depth understanding among massive number of heterogenous samples in the microbiome universe. The codes of Meta-Prism are available at: https://github.com/HUST-NingKang-Lab/metaPrism.


Asunto(s)
Metagenómica/métodos , Microbiota , Humanos , Metagenómica/normas , ARN Ribosómico 16S/genética , Sensibilidad y Especificidad , Programas Informáticos/normas
16.
Brief Bioinform ; 22(1): 416-427, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-31925417

RESUMEN

Recent advances in single-cell RNA sequencing (scRNA-seq) enable characterization of transcriptomic profiles with single-cell resolution and circumvent averaging artifacts associated with traditional bulk RNA sequencing (RNA-seq) data. Here, we propose SCDC, a deconvolution method for bulk RNA-seq that leverages cell-type specific gene expression profiles from multiple scRNA-seq reference datasets. SCDC adopts an ENSEMBLE method to integrate deconvolution results from different scRNA-seq datasets that are produced in different laboratories and at different times, implicitly addressing the problem of batch-effect confounding. SCDC is benchmarked against existing methods using both in silico generated pseudo-bulk samples and experimentally mixed cell lines, whose known cell-type compositions serve as ground truths. We show that SCDC outperforms existing methods with improved accuracy of cell-type decomposition under both settings. To illustrate how the ENSEMBLE framework performs in complex tissues under different scenarios, we further apply our method to a human pancreatic islet dataset and a mouse mammary gland dataset. SCDC returns results that are more consistent with experimental designs and that reproduce more significant associations between cell-type proportions and measured phenotypes.


Asunto(s)
RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos/normas , Animales , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Islotes Pancreáticos/metabolismo , Células MCF-7 , Glándulas Mamarias Animales/metabolismo , Ratones , RNA-Seq/normas , Estándares de Referencia , Análisis de la Célula Individual/normas
17.
Eur Radiol ; 33(5): 3501-3509, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-36624227

RESUMEN

OBJECTIVES: To externally validate the performance of a commercial AI software program for interpreting CXRs in a large, consecutive, real-world cohort from primary healthcare centres. METHODS: A total of 3047 CXRs were collected from two primary healthcare centres, characterised by low disease prevalence, between January and December 2018. All CXRs were labelled as normal or abnormal according to CT findings. Four radiology residents read all CXRs twice with and without AI assistance. The performances of the AI and readers with and without AI assistance were measured in terms of area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity. RESULTS: The prevalence of clinically significant lesions was 2.2% (68 of 3047). The AUROC, sensitivity, and specificity of the AI were 0.648 (95% confidence interval [CI] 0.630-0.665), 35.3% (CI, 24.7-47.8), and 94.2% (CI, 93.3-95.0), respectively. AI detected 12 of 41 pneumonia, 3 of 5 tuberculosis, and 9 of 22 tumours. AI-undetected lesions tended to be smaller than true-positive lesions. The readers' AUROCs ranged from 0.534-0.676 without AI and 0.571-0.688 with AI (all p values < 0.05). For all readers, the mean reading time was 2.96-10.27 s longer with AI assistance (all p values < 0.05). CONCLUSIONS: The performance of commercial AI in these high-volume, low-prevalence settings was poorer than expected, although it modestly boosted the performance of less-experienced readers. The technical prowess of AI demonstrated in experimental settings and approved by regulatory bodies may not directly translate to real-world practice, especially where the demand for AI assistance is highest. KEY POINTS: • This study shows the limited applicability of commercial AI software for detecting abnormalities in CXRs in a health screening population. • When using AI software in a specific clinical setting that differs from the training setting, it is necessary to adjust the threshold or perform additional training with such data that reflects this environment well. • Prospective test accuracy studies, randomised controlled trials, or cohort studies are needed to examine AI software to be implemented in real clinical practice.


Asunto(s)
Inteligencia Artificial , Enfermedades Pulmonares , Radiografía Torácica , Programas Informáticos , Humanos , Prevalencia , Programas Informáticos/normas , Radiografía Torácica/métodos , Radiografía Torácica/normas , Reproducibilidad de los Resultados , Pulmón/diagnóstico por imagen , Enfermedades Pulmonares/diagnóstico por imagen , Estudios de Cohortes , Masculino , Femenino , Adulto , Persona de Mediana Edad , Anciano
18.
Nature ; 602(7895): 172-173, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35102330
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA