Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.702
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Nature ; 630(8016): 493-500, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38718835

RESUMO

The introduction of AlphaFold 21 has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design2-6. Here we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues. The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein-ligand interactions compared with state-of-the-art docking tools, much higher accuracy for protein-nucleic acid interactions compared with nucleic-acid-specific predictors and substantially higher antibody-antigen prediction accuracy compared with AlphaFold-Multimer v.2.37,8. Together, these results show that high-accuracy modelling across biomolecular space is possible within a single unified deep-learning framework.


Assuntos
Aprendizado Profundo , Ligantes , Modelos Moleculares , Proteínas , Software , Humanos , Anticorpos/química , Anticorpos/metabolismo , Antígenos/metabolismo , Antígenos/química , Aprendizado Profundo/normas , Íons/química , Íons/metabolismo , Simulação de Acoplamento Molecular , Ácidos Nucleicos/química , Ácidos Nucleicos/metabolismo , Ligação Proteica , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Reprodutibilidade dos Testes , Software/normas
2.
Nature ; 587(7833): 246-251, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33177663

RESUMO

New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1-3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.


Assuntos
Genoma/genética , Genômica/métodos , Alinhamento de Sequência/métodos , Software , Vertebrados/genética , Âmnio , Animais , Simulação por Computador , Genômica/normas , Haplótipos , Humanos , Controle de Qualidade , Alinhamento de Sequência/normas , Software/normas
3.
Nature ; 580(7805): 663-668, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32152607

RESUMO

On average, an approved drug currently costs US$2-3 billion and takes more than 10 years to develop1. In part, this is due to expensive and time-consuming wet-laboratory experiments, poor initial hit compounds and the high attrition rates in the (pre-)clinical phases. Structure-based virtual screening has the potential to mitigate these problems. With structure-based virtual screening, the quality of the hits improves with the number of compounds screened2. However, despite the fact that large databases of compounds exist, the ability to carry out large-scale structure-based virtual screening on computer clusters in an accessible, efficient and flexible manner has remained difficult. Here we describe VirtualFlow, a highly automated and versatile open-source platform with perfect scaling behaviour that is able to prepare and efficiently screen ultra-large libraries of compounds. VirtualFlow is able to use a variety of the most powerful docking programs. Using VirtualFlow, we prepared one of the largest and freely available ready-to-dock ligand libraries, with more than 1.4 billion commercially available molecules. To demonstrate the power of VirtualFlow, we screened more than 1 billion compounds and identified a set of structurally diverse molecules that bind to KEAP1 with submicromolar affinity. One of the lead inhibitors (iKeap1) engages KEAP1 with nanomolar affinity (dissociation constant (Kd) = 114 nM) and disrupts the interaction between KEAP1 and the transcription factor NRF2. This illustrates the potential of VirtualFlow to access vast regions of the chemical space and identify molecules that bind with high affinity to target proteins.


Assuntos
Descoberta de Drogas/métodos , Avaliação Pré-Clínica de Medicamentos/métodos , Simulação de Acoplamento Molecular/métodos , Software , Interface Usuário-Computador , Acesso à Informação , Automação/métodos , Automação/normas , Computação em Nuvem , Simulação por Computador , Bases de Dados de Compostos Químicos , Descoberta de Drogas/normas , Avaliação Pré-Clínica de Medicamentos/normas , Proteína 1 Associada a ECH Semelhante a Kelch/antagonistas & inibidores , Proteína 1 Associada a ECH Semelhante a Kelch/química , Proteína 1 Associada a ECH Semelhante a Kelch/metabolismo , Ligantes , Simulação de Acoplamento Molecular/normas , Terapia de Alvo Molecular , Fator 2 Relacionado a NF-E2/metabolismo , Reprodutibilidade dos Testes , Software/normas , Termodinâmica
4.
Nature ; 588(7836): 83-88, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33049755

RESUMO

Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years1-7. However, the field has progressed greatly since the development of early programs such as LHASA1,7, for which reaction choices at each step were made by human operators. Multiple software platforms6,8-14 are now capable of completely autonomous planning. But these programs 'think' only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary15,16 and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program's knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships17,18, allowing it to 'strategize' over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization.


Assuntos
Inteligência Artificial , Produtos Biológicos/síntese química , Técnicas de Química Sintética/métodos , Química Orgânica/métodos , Software , Inteligência Artificial/normas , Automação/métodos , Automação/normas , Benzilisoquinolinas/síntese química , Benzilisoquinolinas/química , Técnicas de Química Sintética/normas , Química Orgânica/normas , Indanos/síntese química , Indanos/química , Alcaloides Indólicos/síntese química , Alcaloides Indólicos/química , Bases de Conhecimento , Lactonas/síntese química , Lactonas/química , Macrolídeos/síntese química , Macrolídeos/química , Reprodutibilidade dos Testes , Sesquiterpenos/síntese química , Sesquiterpenos/química , Software/normas , Tetra-Hidroisoquinolinas/síntese química , Tetra-Hidroisoquinolinas/química
5.
Nucleic Acids Res ; 52(6): 2821-2835, 2024 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-38348970

RESUMO

A key attribute of some long noncoding RNAs (lncRNAs) is their ability to regulate expression of neighbouring genes in cis. However, such 'cis-lncRNAs' are presently defined using ad hoc criteria that, we show, are prone to false-positive predictions. The resulting lack of cis-lncRNA catalogues hinders our understanding of their extent, characteristics and mechanisms. Here, we introduce TransCistor, a framework for defining and identifying cis-lncRNAs based on enrichment of targets amongst proximal genes. TransCistor's simple and conservative statistical models are compatible with functionally defined target gene maps generated by existing and future technologies. Using transcriptome-wide perturbation experiments for 268 human and 134 mouse lncRNAs, we provide the first large-scale survey of cis-lncRNAs. Known cis-lncRNAs are correctly identified, including XIST, LINC00240 and UMLILO, and predictions are consistent across analysis methods, perturbation types and independent experiments. We detect cis-activity in a minority of lncRNAs, primarily involving activators over repressors. Cis-lncRNAs are detected by both RNA interference and antisense oligonucleotide perturbations. Mechanistically, cis-lncRNA transcripts are observed to physically associate with their target genes and are weakly enriched with enhancer elements. In summary, TransCistor establishes a quantitative foundation for cis-lncRNAs, opening a path to elucidating their molecular mechanisms and biological significance.


Assuntos
Biologia Computacional , Técnicas Genéticas , RNA Longo não Codificante , Animais , Humanos , Camundongos , RNA Longo não Codificante/genética , RNA Longo não Codificante/isolamento & purificação , Fatores de Transcrição/genética , Transcriptoma , Software/normas , Biologia Computacional/métodos
6.
Nucleic Acids Res ; 52(6): 2836-2847, 2024 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-38412249

RESUMO

The field of synthetic nucleic acids with novel backbone structures [xenobiotic nucleic acids (XNAs)] has flourished due to the increased importance of XNA antisense oligonucleotides and aptamers in medicine, as well as the development of XNA processing enzymes and new XNA genetic materials. Molecular modeling on XNA structures can accelerate rational design in the field of XNAs as it contributes in understanding and predicting how changes in the sugar-phosphate backbone impact on the complementation properties of the nucleic acids. To support the development of novel XNA polymers, we present a first-in-class open-source program (Ducque) to build duplexes of nucleic acid analogs with customizable chemistry. A detailed procedure is described to extend the Ducque library with new user-defined XNA fragments using quantum mechanics (QM) and to generate QM-based force field parameters for molecular dynamics simulations within standard packages such as AMBER. The tool was used within a molecular modeling workflow to accurately reproduce a selection of experimental structures for nucleic acid duplexes with ribose-based as well as non-ribose-based nucleosides. Additionally, it was challenged to build duplexes of morpholino nucleic acids bound to complementary RNA sequences.


Assuntos
Simulação de Dinâmica Molecular , Morfolinos , Ácidos Nucleicos , RNA , Software , Morfolinos/química , Conformação de Ácido Nucleico , Ácidos Nucleicos/química , Oligonucleotídeos/química , RNA/química , Software/normas
7.
Nucleic Acids Res ; 52(6): e31, 2024 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-38364867

RESUMO

Proteins are crucial in regulating every aspect of RNA life, yet understanding their interactions with coding and noncoding RNAs remains limited. Experimental studies are typically restricted to a small number of cell lines and a limited set of RNA-binding proteins (RBPs). Although computational methods based on physico-chemical principles can predict protein-RNA interactions accurately, they often lack the ability to consider cell-type-specific gene expression and the broader context of gene regulatory networks (GRNs). Here, we assess the performance of several GRN inference algorithms in predicting protein-RNA interactions from single-cell transcriptomic data, and propose a pipeline, called scRAPID (single-cell transcriptomic-based RnA Protein Interaction Detection), that integrates these methods with the catRAPID algorithm, which can identify direct physical interactions between RBPs and RNA molecules. Our approach demonstrates that RBP-RNA interactions can be predicted from single-cell transcriptomic data, with performances comparable or superior to those achieved for the well-established task of inferring transcription factor-target interactions. The incorporation of catRAPID significantly enhances the accuracy of identifying interactions, particularly with long noncoding RNAs, and enables the identification of hub RBPs and RNAs. Additionally, we show that interactions between RBPs can be detected based on their inferred RNA targets. The software is freely available at https://github.com/tartaglialabIIT/scRAPID.


Assuntos
Proteínas de Ligação a RNA , RNA , Análise da Expressão Gênica de Célula Única , Software , Algoritmos , RNA/genética , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo , Software/normas , Redes Reguladoras de Genes , Humanos , Linhagem Celular
8.
Plant Physiol ; 195(1): 378-394, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38298139

RESUMO

Automated guard cell detection and measurement are vital for understanding plant physiological performance and ecological functioning in global water and carbon cycles. Most current methods for measuring guard cells and stomata are laborious, time-consuming, prone to bias, and limited in scale. We developed StoManager1, a high-throughput tool utilizing geometrical, mathematical algorithms, and convolutional neural networks to automatically detect, count, and measure over 30 guard cell and stomatal metrics, including guard cell and stomatal area, length, width, stomatal aperture area/guard cell area, orientation, stomatal evenness, divergence, and aggregation index. Combined with leaf functional traits, some of these StoManager1-measured guard cell and stomatal metrics explained 90% and 82% of tree biomass and intrinsic water use efficiency (iWUE) variances in hardwoods, making them substantial factors in leaf physiology and tree growth. StoManager1 demonstrated exceptional precision and recall (mAP@0.5 over 0.96), effectively capturing diverse stomatal properties across over 100 species. StoManager1 facilitates the automation of measuring leaf stomatal and guard cells, enabling broader exploration of stomatal control in plant growth and adaptation to environmental stress and climate change. This has implications for global gross primary productivity (GPP) modeling and estimation, as integrating stomatal metrics can enhance predictions of plant growth and resource usage worldwide. Easily accessible open-source code and standalone Windows executable applications are available on a GitHub repository (https://github.com/JiaxinWang123/StoManager1) and Zenodo (https://doi.org/10.5281/zenodo.7686022).


Assuntos
Botânica , Biologia Celular , Células Vegetais , Estômatos de Plantas , Software , Estômatos de Plantas/citologia , Estômatos de Plantas/crescimento & desenvolvimento , Células Vegetais/fisiologia , Botânica/instrumentação , Botânica/métodos , Biologia Celular/instrumentação , Processamento de Imagem Assistida por Computador/normas , Algoritmos , Folhas de Planta/citologia , Redes Neurais de Computação , Ensaios de Triagem em Larga Escala/instrumentação , Ensaios de Triagem em Larga Escala/métodos , Ensaios de Triagem em Larga Escala/normas , Software/normas
9.
13.
Brief Bioinform ; 22(1): 109-126, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31813964

RESUMO

MOTIVATION: Biological systems function through dynamic interactions among genes and their products, regulatory circuits and metabolic networks. Our development of the Pathway Tools software was motivated by the need to construct biological knowledge resources that combine these many types of data, and that enable users to find and comprehend data of interest as quickly as possible through query and visualization tools. Further, we sought to support the development of metabolic flux models from pathway databases, and to use pathway information to leverage the interpretation of high-throughput data sets. RESULTS: In the past 4 years we have enhanced the already extensive Pathway Tools software in several respects. It can now support metabolic-model execution through the Web, it provides a more accurate gap filler for metabolic models; it supports development of models for organism communities distributed across a spatial grid; and model results may be visualized graphically. Pathway Tools supports several new omics-data analysis tools including the Omics Dashboard, multi-pathway diagrams called pathway collages, a pathway-covering algorithm for metabolomics data analysis and an algorithm for generating mechanistic explanations of multi-omics data. We have also improved the core pathway/genome databases management capabilities of the software, providing new multi-organism search tools for organism communities, improved graphics rendering, faster performance and re-designed gene and metabolite pages. AVAILABILITY: The software is free for academic use; a fee is required for commercial use. See http://pathwaytools.com. CONTACT: pkarp@ai.sri.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Briefings in Bioinformatics online.


Assuntos
Genômica/métodos , Metabolômica/métodos , Software/normas , Biologia de Sistemas/métodos , Animais , Humanos
14.
Brief Bioinform ; 22(1): 146-163, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31838514

RESUMO

MOTIVATION: Annotation tools are applied to build training and test corpora, which are essential for the development and evaluation of new natural language processing algorithms. Further, annotation tools are also used to extract new information for a particular use case. However, owing to the high number of existing annotation tools, finding the one that best fits particular needs is a demanding task that requires searching the scientific literature followed by installing and trying various tools. METHODS: We searched for annotation tools and selected a subset of them according to five requirements with which they should comply, such as being Web-based or supporting the definition of a schema. We installed the selected tools (when necessary), carried out hands-on experiments and evaluated them using 26 criteria that covered functional and technical aspects. We defined each criterion on three levels of matches and a score for the final evaluation of the tools. RESULTS: We evaluated 78 tools and selected the following 15 for a detailed evaluation: BioQRator, brat, Catma, Djangology, ezTag, FLAT, LightTag, MAT, MyMiner, PDFAnno, prodigy, tagtog, TextAE, WAT-SL and WebAnno. Full compliance with our 26 criteria ranged from only 9 up to 20 criteria, which demonstrated that some tools are comprehensive and mature enough to be used on most annotation projects. The highest score of 0.81 was obtained by WebAnno (of a maximum value of 1.0).


Assuntos
Biologia Computacional/normas , Curadoria de Dados/normas , Biologia Computacional/métodos , Curadoria de Dados/métodos , Software/normas
15.
Brief Bioinform ; 22(1): 557-567, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-32031567

RESUMO

Microbiome samples are accumulating at an unprecedented speed. As a result, a massive amount of samples have become available for the mining of the intrinsic patterns among them. However, due to the lack of advanced computational tools, fast yet accurate comparisons and searches among thousands to millions of samples are still in urgent need. In this work, we proposed the Meta-Prism method for comparing and searching the microbial community structures amongst tens of thousands of samples. Meta-Prism is at least 10 times faster than contemporary methods serving the same purpose and can provide very accurate search results. The method is based on three computational techniques: dual-indexing approach for sample subgrouping, refined scoring function that could scrutinize the minute differences among samples, and parallel computation on CPU or GPU. The superiority of Meta-Prism on speed and accuracy for multiple sample searches is proven based on searching against ten thousand samples derived from both human and environments. Therefore, Meta-Prism could facilitate similarity search and in-depth understanding among massive number of heterogenous samples in the microbiome universe. The codes of Meta-Prism are available at: https://github.com/HUST-NingKang-Lab/metaPrism.


Assuntos
Metagenômica/métodos , Microbiota , Humanos , Metagenômica/normas , RNA Ribossômico 16S/genética , Sensibilidade e Especificidade , Software/normas
16.
Brief Bioinform ; 22(1): 416-427, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31925417

RESUMO

Recent advances in single-cell RNA sequencing (scRNA-seq) enable characterization of transcriptomic profiles with single-cell resolution and circumvent averaging artifacts associated with traditional bulk RNA sequencing (RNA-seq) data. Here, we propose SCDC, a deconvolution method for bulk RNA-seq that leverages cell-type specific gene expression profiles from multiple scRNA-seq reference datasets. SCDC adopts an ENSEMBLE method to integrate deconvolution results from different scRNA-seq datasets that are produced in different laboratories and at different times, implicitly addressing the problem of batch-effect confounding. SCDC is benchmarked against existing methods using both in silico generated pseudo-bulk samples and experimentally mixed cell lines, whose known cell-type compositions serve as ground truths. We show that SCDC outperforms existing methods with improved accuracy of cell-type decomposition under both settings. To illustrate how the ENSEMBLE framework performs in complex tissues under different scenarios, we further apply our method to a human pancreatic islet dataset and a mouse mammary gland dataset. SCDC returns results that are more consistent with experimental designs and that reproduce more significant associations between cell-type proportions and measured phenotypes.


Assuntos
RNA-Seq/métodos , Análise de Célula Única/métodos , Software/normas , Animais , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Ilhotas Pancreáticas/metabolismo , Células MCF-7 , Glândulas Mamárias Animais/metabolismo , Camundongos , RNA-Seq/normas , Padrões de Referência , Análise de Célula Única/normas
17.
Eur Radiol ; 33(5): 3501-3509, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36624227

RESUMO

OBJECTIVES: To externally validate the performance of a commercial AI software program for interpreting CXRs in a large, consecutive, real-world cohort from primary healthcare centres. METHODS: A total of 3047 CXRs were collected from two primary healthcare centres, characterised by low disease prevalence, between January and December 2018. All CXRs were labelled as normal or abnormal according to CT findings. Four radiology residents read all CXRs twice with and without AI assistance. The performances of the AI and readers with and without AI assistance were measured in terms of area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity. RESULTS: The prevalence of clinically significant lesions was 2.2% (68 of 3047). The AUROC, sensitivity, and specificity of the AI were 0.648 (95% confidence interval [CI] 0.630-0.665), 35.3% (CI, 24.7-47.8), and 94.2% (CI, 93.3-95.0), respectively. AI detected 12 of 41 pneumonia, 3 of 5 tuberculosis, and 9 of 22 tumours. AI-undetected lesions tended to be smaller than true-positive lesions. The readers' AUROCs ranged from 0.534-0.676 without AI and 0.571-0.688 with AI (all p values < 0.05). For all readers, the mean reading time was 2.96-10.27 s longer with AI assistance (all p values < 0.05). CONCLUSIONS: The performance of commercial AI in these high-volume, low-prevalence settings was poorer than expected, although it modestly boosted the performance of less-experienced readers. The technical prowess of AI demonstrated in experimental settings and approved by regulatory bodies may not directly translate to real-world practice, especially where the demand for AI assistance is highest. KEY POINTS: • This study shows the limited applicability of commercial AI software for detecting abnormalities in CXRs in a health screening population. • When using AI software in a specific clinical setting that differs from the training setting, it is necessary to adjust the threshold or perform additional training with such data that reflects this environment well. • Prospective test accuracy studies, randomised controlled trials, or cohort studies are needed to examine AI software to be implemented in real clinical practice.


Assuntos
Inteligência Artificial , Pneumopatias , Radiografia Torácica , Software , Humanos , Prevalência , Software/normas , Radiografia Torácica/métodos , Radiografia Torácica/normas , Reprodutibilidade dos Testes , Pulmão/diagnóstico por imagem , Pneumopatias/diagnóstico por imagem , Estudos de Coortes , Masculino , Feminino , Adulto , Pessoa de Meia-Idade , Idoso
18.
Nature ; 602(7895): 172-173, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35102330
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA