Pesquisa | Secretaria de Estado da Saúde

1.

Benchmarking multi-omics integration algorithms across single-cell RNA and ATAC data.

Xiao, Chuxi; Chen, Yixin; Meng, Qiuchen; Wei, Lei; Zhang, Xuegong.

Brief Bioinform ; 25(2)2024 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-38493343

RESUMO

Recent advancements in single-cell sequencing technologies have generated extensive omics data in various modalities and revolutionized cell research, especially in the single-cell RNA and ATAC data. The joint analysis across scRNA-seq data and scATAC-seq data has paved the way to comprehending the cellular heterogeneity and complex cellular regulatory networks. Multi-omics integration is gaining attention as an important step in joint analysis, and the number of computational tools in this field is growing rapidly. In this paper, we benchmarked 12 multi-omics integration methods on three integration tasks via qualitative visualization and quantitative metrics, considering six main aspects that matter in multi-omics data analysis. Overall, we found that different methods have their own advantages on different aspects, while some methods outperformed other methods in most aspects. We therefore provided guidelines for selecting appropriate methods for specific scenarios and tasks to help obtain meaningful insights from multi-omics data integration.

Assuntos

Benchmarking , Multiômica , Algoritmos , Ciclo Celular , RNA

2.

Performance assessment of computational tools to detect microsatellite instability.

Anthony, Harrison; Seoighe, Cathal.

Brief Bioinform ; 25(5)2024 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-39129364

RESUMO

Microsatellite instability (MSI) is a phenomenon seen in several cancer types, which can be used as a biomarker to help guide immune checkpoint inhibitor treatment. To facilitate this, researchers have developed computational tools to categorize samples as having high microsatellite instability, or as being microsatellite stable using next-generation sequencing data. Most of these tools were published with unclear scope and usage, and they have yet to be independently benchmarked. To address these issues, we assessed the performance of eight leading MSI tools across several unique datasets that encompass a wide variety of sequencing methods. While we were able to replicate the original findings of each tool on whole exome sequencing data, most tools had worse receiver operating characteristic and precision-recall area under the curve values on whole genome sequencing data. We also found that they lacked agreement with one another and with commercial MSI software on gene panel data, and that optimal threshold cut-offs vary by sequencing type. Lastly, we tested tools made specifically for RNA sequencing data and found they were outperformed by tools designed for use with DNA sequencing data. Out of all, two tools (MSIsensor2, MANTIS) performed well across nearly all datasets, but when all datasets were combined, their precision decreased. Our results caution that MSI tools can have much lower performance on datasets other than those on which they were originally evaluated, and in the case of RNA sequencing tools, can even perform poorly on the type of data for which they were created.

Assuntos

Biologia Computacional , Instabilidade de Microssatélites , Software , Humanos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Sequenciamento do Exoma/métodos

3.

Assessing computational predictions of antimicrobial resistance phenotypes from microbial genomes.

Hu, Kaixin; Meyer, Fernando; Deng, Zhi-Luo; Asgari, Ehsaneddin; Kuo, Tzu-Hao; Münch, Philipp C; McHardy, Alice C.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38706320

RESUMO

The advent of rapid whole-genome sequencing has created new opportunities for computational prediction of antimicrobial resistance (AMR) phenotypes from genomic data. Both rule-based and machine learning (ML) approaches have been explored for this task, but systematic benchmarking is still needed. Here, we evaluated four state-of-the-art ML methods (Kover, PhenotypeSeeker, Seq2Geno2Pheno and Aytan-Aktug), an ML baseline and the rule-based ResFinder by training and testing each of them across 78 species-antibiotic datasets, using a rigorous benchmarking workflow that integrates three evaluation approaches, each paired with three distinct sample splitting methods. Our analysis revealed considerable variation in the performance across techniques and datasets. Whereas ML methods generally excelled for closely related strains, ResFinder excelled for handling divergent genomes. Overall, Kover most frequently ranked top among the ML approaches, followed by PhenotypeSeeker and Seq2Geno2Pheno. AMR phenotypes for antibiotic classes such as macrolides and sulfonamides were predicted with the highest accuracies. The quality of predictions varied substantially across species-antibiotic combinations, particularly for beta-lactams; across species, resistance phenotyping of the beta-lactams compound, aztreonam, amoxicillin/clavulanic acid, cefoxitin, ceftazidime and piperacillin/tazobactam, alongside tetracyclines demonstrated more variable performance than the other benchmarked antibiotics. By organism, Campylobacter jejuni and Enterococcus faecium phenotypes were more robustly predicted than those of Escherichia coli, Staphylococcus aureus, Salmonella enterica, Neisseria gonorrhoeae, Klebsiella pneumoniae, Pseudomonas aeruginosa, Acinetobacter baumannii, Streptococcus pneumoniae and Mycobacterium tuberculosis. In addition, our study provides software recommendations for each species-antibiotic combination. It furthermore highlights the need for optimization for robust clinical applications, particularly for strains that diverge substantially from those used for training.

Assuntos

Antibacterianos , Fenótipo , Antibacterianos/farmacologia , Aprendizado de Máquina , Farmacorresistência Bacteriana/genética , Biologia Computacional/métodos , Genoma Bacteriano , Genoma Microbiano , Humanos , Bactérias/genética , Bactérias/efeitos dos fármacos

4.

Benchmarking of computational methods for m6A profiling with Nanopore direct RNA sequencing.

Maestri, Simone; Furlan, Mattia; Mulroney, Logan; Coscujuela Tarrero, Lucia; Ugolini, Camilla; Dalla Pozza, Fabio; Leonardi, Tommaso; Birney, Ewan; Nicassio, Francesco; Pelizzola, Mattia.

Brief Bioinform ; 25(2)2024 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-38279646

RESUMO

N6-methyladenosine (m6A) is the most abundant internal eukaryotic mRNA modification, and is involved in the regulation of various biological processes. Direct Nanopore sequencing of native RNA (dRNA-seq) emerged as a leading approach for its identification. Several software were published for m6A detection and there is a strong need for independent studies benchmarking their performance on data from different species, and against various reference datasets. Moreover, a computational workflow is needed to streamline the execution of tools whose installation and execution remains complicated. We developed NanOlympicsMod, a Nextflow pipeline exploiting containerized technology for comparing 14 tools for m6A detection on dRNA-seq data. NanOlympicsMod was tested on dRNA-seq data generated from in vitro (un)modified synthetic oligos. The m6A hits returned by each tool were compared to the m6A position known by design of the oligos. In addition, NanOlympicsMod was used on dRNA-seq datasets from wild-type and m6A-depleted yeast, mouse and human, and each tool's hits were compared to reference m6A sets generated by leading orthogonal methods. The performance of the tools markedly differed across datasets, and methods adopting different approaches showed different preferences in terms of precision and recall. Changing the stringency cut-offs allowed for tuning the precision-recall trade-off towards user preferences. Finally, we determined that precision and recall of tools are markedly influenced by sequencing depth, and that additional sequencing would likely reveal additional m6A sites. Thanks to the possibility of including novel tools, NanOlympicsMod will streamline the benchmarking of m6A detection tools on dRNA-seq data, improving future RNA modification characterization.

Assuntos

Adenina/análogos & derivados , Sequenciamento por Nanoporos , Nanoporos , Humanos , Animais , Camundongos , RNA/genética , Benchmarking , Análise de Sequência de RNA/métodos

5.

Optimizing in silico drug discovery: simulation of connected differential expression signatures and applications to benchmarking.

Gonzalez Gomez, Catalina; Rosa-Calatrava, Manuel; Fouret, Julien.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38935068

RESUMO

BACKGROUND: We present a novel simulation method for generating connected differential expression signatures. Traditional methods have struggled with the lack of reliable benchmarking data and biases in drug-disease pair labeling, limiting the rigorous benchmarking of connectivity-based approaches. OBJECTIVE: Our aim is to develop a simulation method based on a statistical framework that allows for adjustable levels of parametrization, especially the connectivity, to generate a pair of interconnected differential signatures. This could help to address the issue of benchmarking data availability for connectivity-based drug repurposing approaches. METHODS: We first detailed the simulation process and how it reflected real biological variability and the interconnectedness of gene expression signatures. Then, we generated several datasets to enable the evaluation of different existing algorithms that compare differential expression signatures, providing insights into their performance and limitations. RESULTS: Our findings demonstrate the ability of our simulation to produce realistic data, as evidenced by correlation analyses and the log2 fold-change distribution of deregulated genes. Benchmarking reveals that methods like extreme cosine similarity and Pearson correlation outperform others in identifying connected signatures. CONCLUSION: Overall, our method provides a reliable tool for simulating differential expression signatures. The data simulated by our tool encompass a wide spectrum of possibilities to challenge and evaluate existing methods to estimate connectivity scores. This may represent a critical gap in connectivity-based drug repurposing research because reliable benchmarking data are essential for assessing and advancing in the development of new algorithms. The simulation tool is available as a R package (General Public License (GPL) license) at https://github.com/cgonzalez-gomez/cosimu.

Assuntos

Algoritmos , Benchmarking , Simulação por Computador , Descoberta de Drogas , Descoberta de Drogas/métodos , Humanos , Perfilação da Expressão Gênica/métodos , Biologia Computacional/métodos , Reposicionamento de Medicamentos/métodos , Transcriptoma

6.

Mutational signatures of colorectal cancers according to distinct computational workflows.

Battuello, Paolo; Corti, Giorgio; Bartolini, Alice; Lorenzato, Annalisa; Sogari, Alberto; Russo, Mariangela; Di Nicolantonio, Federica; Bardelli, Alberto; Crisafulli, Giovanni.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38783705

RESUMO

Tumor mutational signatures have gained prominence in cancer research, yet the lack of standardized methods hinders reproducibility and robustness. Leveraging colorectal cancer (CRC) as a model, we explored the influence of computational parameters on mutational signature analyses across 230 CRC cell lines and 152 CRC patients. Results were validated in three independent datasets: 483 endometrial cancer patients stratified by mismatch repair (MMR) status, 35 lung cancer patients by smoking status and 12 patient-derived organoids (PDOs) annotated for colibactin exposure. Assessing various bioinformatic tools, reference datasets and input data sizes including whole genome sequencing, whole exome sequencing and a pan-cancer gene panel, we demonstrated significant variability in the results. We report that the use of distinct algorithms and references led to statistically different results, highlighting how arbitrary choices may induce variability in the mutational signature contributions. Furthermore, we found a differential contribution of mutational signatures between coding and intergenic regions and defined the minimum number of somatic variants required for reliable mutational signature assignment. To facilitate the identification of the most suitable workflows, we developed Comparative Mutational Signature analysis on Coding and Extragenic Regions (CoMSCER), a bioinformatic tool which allows researchers to easily perform comparative mutational signature analysis by coupling the results from several tools and public reference datasets and to assess mutational signature contributions in coding and non-coding genomic regions. In conclusion, our study provides a comparative framework to elucidate the impact of distinct computational workflows on mutational signatures.

Assuntos

Neoplasias Colorretais , Biologia Computacional , Mutação , Humanos , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Biologia Computacional/métodos , Fluxo de Trabalho , Linhagem Celular Tumoral , Sequenciamento do Exoma/métodos , Feminino , Algoritmos

7.

Assessing next-generation sequencing-based computational methods for predicting transcriptional regulators with query gene sets.

Lu, Zeyu; Xiao, Xue; Zheng, Qiang; Wang, Xinlei; Xu, Lin.

Brief Bioinform ; 25(5)2024 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-39082650

RESUMO

This article provides an in-depth review of computational methods for predicting transcriptional regulators (TRs) with query gene sets. Identification of TRs is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement.

Assuntos

Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biologia Computacional/métodos , Humanos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Regulação da Expressão Gênica

8.

Model-based evaluation of spatiotemporal data reduction methods with unknown ground truth through optimal visualization and interpretability metrics.

Atitey, Komlan; Motsinger-Reif, Alison A; Anchang, Benedict.

Brief Bioinform ; 25(1)2023 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-38113074

RESUMO

Optimizing and benchmarking data reduction methods for dynamic or spatial visualization and interpretation (DSVI) face challenges due to many factors, including data complexity, lack of ground truth, time-dependent metrics, dimensionality bias and different visual mappings of the same data. Current studies often focus on independent static visualization or interpretability metrics that require ground truth. To overcome this limitation, we propose the MIBCOVIS framework, a comprehensive and interpretable benchmarking and computational approach. MIBCOVIS enhances the visualization and interpretability of high-dimensional data without relying on ground truth by integrating five robust metrics, including a novel time-ordered Markov-based structural metric, into a semi-supervised hierarchical Bayesian model. The framework assesses method accuracy and considers interaction effects among metric features. We apply MIBCOVIS using linear and nonlinear dimensionality reduction methods to evaluate optimal DSVI for four distinct dynamic and spatial biological processes captured by three single-cell data modalities: CyTOF, scRNA-seq and CODEX. These data vary in complexity based on feature dimensionality, unknown cell types and dynamic or spatial differences. Unlike traditional single-summary score approaches, MIBCOVIS compares accuracy distributions across methods. Our findings underscore the joint evaluation of visualization and interpretability, rather than relying on separate metrics. We reveal that prioritizing average performance can obscure method feature performance. Additionally, we explore the impact of data complexity on visualization and interpretability. Specifically, we provide optimal parameters and features and recommend methods, like the optimized variational contractive autoencoder, for targeted DSVI for various data complexities. MIBCOVIS shows promise for evaluating dynamic single-cell atlases and spatiotemporal data reduction models.

Assuntos

Benchmarking , Análise de Célula Única , Teorema de Bayes , Análise de Célula Única/métodos

9.

Systematic discovery of protein interaction interfaces using AlphaFold and experimental validation.

Lee, Chop Yan; Hubrich, Dalmira; Varga, Julia K; Schäfer, Christian; Welzel, Mareen; Schumbera, Eric; Djokic, Milena; Strom, Joelle M; Schönfeld, Jonas; Geist, Johanna L; Polat, Feyza; Gibson, Toby J; Keller Valsecchi, Claudia Isabelle; Kumar, Manjeet; Schueler-Furman, Ora; Luck, Katja.

Mol Syst Biol ; 20(2): 75-97, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38225382

RESUMO

Structural resolution of protein interactions enables mechanistic and functional studies as well as interpretation of disease variants. However, structural data is still missing for most protein interactions because we lack computational and experimental tools at scale. This is particularly true for interactions mediated by short linear motifs occurring in disordered regions of proteins. We find that AlphaFold-Multimer predicts with high sensitivity but limited specificity structures of domain-motif interactions when using small protein fragments as input. Sensitivity decreased substantially when using long protein fragments or full length proteins. We delineated a protein fragmentation strategy particularly suited for the prediction of domain-motif interfaces and applied it to interactions between human proteins associated with neurodevelopmental disorders. This enabled the prediction of highly confident and likely disease-related novel interfaces, which we further experimentally corroborated for FBXO23-STX1B, STX1B-VAMP2, ESRRG-PSMC5, PEX3-PEX19, PEX3-PEX16, and SNRPB-GIGYF1 providing novel molecular insights for diverse biological processes. Our work highlights exciting perspectives, but also reveals clear limitations and the need for future developments to maximize the power of Alphafold-Multimer for interface predictions.

Assuntos

Proteínas de Transporte , Proteínas , Humanos , Proteínas/metabolismo , Proteínas de Membrana/metabolismo

10.

Dyport: dynamic importance-based biomedical hypothesis generation benchmarking technique.

Tyagin, Ilya; Safro, Ilya.

BMC Bioinformatics ; 25(1): 213, 2024 Jun 13.

Artigo em Inglês | MEDLINE | ID: mdl-38872097

RESUMO

BACKGROUND: Automated hypothesis generation (HG) focuses on uncovering hidden connections within the extensive information that is publicly available. This domain has become increasingly popular, thanks to modern machine learning algorithms. However, the automated evaluation of HG systems is still an open problem, especially on a larger scale. RESULTS: This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypotheses accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community. CONCLUSIONS: Dyport is an open-source benchmarking framework designed for biomedical hypothesis generation systems evaluation, which takes into account knowledge dynamics, semantics and impact. All code and datasets are available at: https://github.com/IlyaTyagin/Dyport .

Assuntos

Benchmarking , Benchmarking/métodos , Algoritmos , Pesquisa Biomédica/métodos , Software , Aprendizado de Máquina , Bases de Dados Factuais , Biologia Computacional/métodos , Semântica

11.

Benefit of In Silico Predicted Spectral Libraries in Data-Independent Acquisition Data Analysis Workflows.

Staes, An; Mendes Maia, Teresa; Dufour, Sara; Bouwmeester, Robbin; Gabriels, Ralf; Martens, Lennart; Gevaert, Kris; Impens, Francis; Devos, Simon.

J Proteome Res ; 23(6): 2078-2089, 2024 Jun 07.

Artigo em Inglês | MEDLINE | ID: mdl-38666436

RESUMO

Data-independent acquisition (DIA) has become a well-established method for MS-based proteomics. However, the list of options to analyze this type of data is quite extensive, and the use of spectral libraries has become an important factor in DIA data analysis. More specifically the use of in silico predicted libraries is gaining more interest. By working with a differential spike-in of human standard proteins (UPS2) in a constant yeast tryptic digest background, we evaluated the sensitivity, precision, and accuracy of the use of in silico predicted libraries in data DIA data analysis workflows compared to more established workflows. Three commonly used DIA software tools, DIA-NN, EncyclopeDIA, and Spectronaut, were each tested in spectral library mode and spectral library-free mode. In spectral library mode, we used independent spectral library prediction tools PROSIT and MS2PIP together with DeepLC, next to classical data-dependent acquisition (DDA)-based spectral libraries. In total, we benchmarked 12 computational workflows for DIA. Our comparison showed that DIA-NN reached the highest sensitivity while maintaining a good compromise on the reproducibility and accuracy levels in either library-free mode or using in silico predicted libraries pointing to a general benefit in using in silico predicted libraries.

Assuntos

Simulação por Computador , Proteômica , Software , Fluxo de Trabalho , Proteômica/métodos , Proteômica/estatística & dados numéricos , Humanos , Reprodutibilidade dos Testes , Análise de Dados , Biblioteca de Peptídeos

12.

Improving Proteomic Identification Using Narrow Isolation Windows with Zeno SWATH Data-Independent Acquisition.

Gu, Kongxin; Kumabe, Haruka; Yamamoto, Takumi; Tashiro, Naoto; Masuda, Takeshi; Ito, Shingo; Ohtsuki, Sumio.

J Proteome Res ; 23(8): 3484-3495, 2024 Aug 02.

Artigo em Inglês | MEDLINE | ID: mdl-38978496

RESUMO

Data-independent acquisition (DIA) techniques such as sequential window acquisition of all theoretical mass spectra (SWATH) acquisition have emerged as the preferred strategies for proteomic analyses. Our study optimized the SWATH-DIA method using a narrow isolation window placement approach, improving its proteomic performance. We optimized the acquisition parameter combinations of narrow isolation windows with different widths (1.9 and 2.9 Da) on a ZenoTOF 7600 (Sciex); the acquired data were analyzed using DIA-NN (version 1.8.1). Narrow SWATH (nSWATH) identified 5916 and 7719 protein groups on the digested peptides, corresponding to 400 ng of protein from mouse liver and HEK293T cells, respectively, improving identification by 7.52 and 4.99%, respectively, compared to conventional SWATH. The median coefficient of variation of the quantified values was less than 6%. We further analyzed 200 ng of benchmark samples comprising peptides from known ratios ofEscherichia coli, yeast, and human peptides using nSWATH. Consequently, it achieved accuracy and precision comparable to those of conventional SWATH, identifying an average of 95,456 precursors and 9342 protein groups across three benchmark samples, representing 12.6 and 9.63% improved identification compared to conventional SWATH. The nSWATH method improved identification at various loading amounts of benchmark samples, identifying 40.7% more protein groups at 25 ng. These results demonstrate the improved performance of nSWATH, contributing to the acquisition of deeper proteomic data from complex biological samples.

Assuntos

Proteômica , Proteômica/métodos , Humanos , Animais , Camundongos , Células HEK293 , Fígado/metabolismo , Fígado/química , Peptídeos/química , Peptídeos/análise , Peptídeos/isolamento & purificação , Proteoma/análise , Escherichia coli/metabolismo , Escherichia coli/genética , Espectrometria de Massas em Tandem/métodos , Espectrometria de Massas/métodos

13.

MAGICIAN: MAG simulation for investigating criteria for bioinformatic analysis.

Steinke, Kat; Pamp, Sünje J; Munk, Patrick.

BMC Genomics ; 25(1): 55, 2024 Jan 12.

Artigo em Inglês | MEDLINE | ID: mdl-38216924

RESUMO

BACKGROUND: The possibility of recovering metagenome-assembled genomes (MAGs) from sequence reads allows for further insights into microbial communities and their members, possibly even analyzing such sequences with tools designed for single-isolate genomes. As result quality depends on sequence quality, performance of tools for single-isolate genomes on MAGs should be tested beforehand. Bioinformatics can be leveraged to quickly create varied synthetic test sets with known composition for this purpose. RESULTS: We present MAGICIAN, a flexible, user-friendly pipeline for the simulation of MAGs. MAGICIAN combines a synthetic metagenome simulator with a metagenomic assembly and binning pipeline to simulate MAGs based on user-supplied input genomes, allowing users to test performance of tools on MAGs while having a ground truth to compare results to. Using MAGICIAN, we found that even very slight (1%) changes in depth of coverage can drastically affect whether a genome can be recovered. We also demonstrate the use of simulated MAGs by evaluating the suitability of such genomes obtained with MAGICIAN's current default pipeline for analysis with the antimicrobial resistance gene identification tool ResFinder. CONCLUSIONS: Using MAGICIAN, it is possible to simulate MAGs which, while generally high in quality, reflect issues encountered with real-world data, thus providing realistic best-case data. Evaluating the results of ResFinder analysis of these genomes revealed a risk for plausible-looking false positives, which underlines the need for pipeline validation so that researchers are aware of the potential issues when interpreting real-world data. Furthermore, the effects of fluctuations in depth of coverage on genome recovery in our simulated "random sequencing" warrant further investigation and indicate random subsampling of reads may affect discovery of more genomes.

Assuntos

Metagenoma , Microbiota , Simulação por Computador , Microbiota/genética , Metagenômica/métodos , Biologia Computacional

14.

Benchmarking short and long read polishing tools for nanopore assemblies: achieving near-perfect genomes for outbreak isolates.

Luan, Tu; Commichaux, Seth; Hoffmann, Maria; Jayeola, Victor; Jang, Jae Hee; Pop, Mihai; Rand, Hugh; Luo, Yan.

BMC Genomics ; 25(1): 679, 2024 Jul 08.

Artigo em Inglês | MEDLINE | ID: mdl-38978005

RESUMO

BACKGROUND: Oxford Nanopore provides high throughput sequencing platforms able to reconstruct complete bacterial genomes with 99.95% accuracy. However, even small levels of error can obscure the phylogenetic relationships between closely related isolates. Polishing tools have been developed to correct these errors, but it is uncertain if they obtain the accuracy needed for the high-resolution source tracking of foodborne illness outbreaks. RESULTS: We tested 132 combinations of assembly and short- and long-read polishing tools to assess their accuracy for reconstructing the genome sequences of 15 highly similar Salmonella enterica serovar Newport isolates from a 2020 onion outbreak. While long-read polishing alone improved accuracy, near perfect accuracy (99.9999% accuracy or ~ 5 nucleotide errors across the 4.8 Mbp genome, excluding low confidence regions) was only obtained by pipelines that combined both long- and short-read polishing tools. Notably, medaka was a more accurate and efficient long-read polisher than Racon. Among short-read polishers, NextPolish showed the highest accuracy, but Pilon, Polypolish, and POLCA performed similarly. Among the 5 best performing pipelines, polishing with medaka followed by NextPolish was the most common combination. Importantly, the order of polishing tools mattered i.e., using less accurate tools after more accurate ones introduced errors. Indels in homopolymers and repetitive regions, where the short reads could not be uniquely mapped, remained the most challenging errors to correct. CONCLUSIONS: Short reads are still needed to correct errors in nanopore sequenced assemblies to obtain the accuracy required for source tracking investigations. Our granular assessment of the performance of the polishing pipelines allowed us to suggest best practices for tool users and areas for improvement for tool developers.

Assuntos

Benchmarking , Surtos de Doenças , Genoma Bacteriano , Nanoporos , Sequenciamento por Nanoporos/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Salmonella enterica/genética , Salmonella enterica/isolamento & purificação , Humanos , Filogenia

15.

Sequencing by binding rivals SMOR error-corrected sequencing by synthesis technology for accurate detection and quantification of minor (< 0.1%) subpopulation variants.

Allender, Christopher J; Wike, Candice L; Porter, W Tanner; Ellis, Dean; Lemmer, Darrin; Pond, Stephanie J K; Engelthaler, David M.

BMC Genomics ; 25(1): 789, 2024 Aug 19.

Artigo em Inglês | MEDLINE | ID: mdl-39160478

RESUMO

BACKGROUND: Detecting very minor (< 1%) subpopulations using next-generation sequencing is a critical need for multiple applications, including the detection of drug resistant pathogens and somatic variant detection in oncology. A recently available sequencing approach termed 'sequencing by binding (SBB)' claims to have higher base calling accuracy data "out of the box." This paper evaluates the utility of using SBB for the detection of ultra-rare drug resistant subpopulations in Mycobacterium tuberculosis (Mtb) using a targeted amplicon assay and compares the performance of SBB to single molecule overlapping reads (SMOR) error corrected sequencing by synthesis (SBS) data. RESULTS: SBS displayed an elevated error rate when compared to SMOR error-corrected SBS and SBB techniques. SMOR error-corrected SBS and SBB technologies performed similarly within the linear range studies and error rate studies. CONCLUSIONS: With lower sequencing error rates within SBB sequencing, this technique looks promising for both targeted and unbiased whole genome sequencing, leading to the identification of minor (< 1%) subpopulations without the need for error correction methods.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Mycobacterium tuberculosis , Mycobacterium tuberculosis/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Humanos , Sequenciamento Completo do Genoma/métodos

16.

Antipsychotic drugs in first-episode psychosis: a target trial emulation in the FEP-CAUSAL Collaboration.

Szmulewicz, Alejandro G; Martínez-Alés, Gonzalo; Logan, Roger; Ferrara, Maria; Kelly, Christian; Fredrikson, Diane; Gago, Juan; Conderino, Sarah; Díaz-Caneja, Covadonga M; Galvañ, Joaquín; Thorpe, Lorna; Srihari, Vinod; Yatham, Lakshmi; Sarpal, Deepak K; Shinn, Ann K; Arango, Celso; Öngür, Dost; Hernán, Miguel A; Fep-Causal Collaboration, On Behalf Of The.

Am J Epidemiol ; 193(8): 1081-1087, 2024 Aug 05.

Artigo em Inglês | MEDLINE | ID: mdl-38576166

RESUMO

Good adherence to antipsychotic therapy helps prevent relapses in first-episode psychosis (FEP). We used data from the FEP-CAUSAL Collaboration, an international consortium of observational cohorts, to emulate a target trial comparing antipsychotics, with treatment discontinuation as the primary outcome. Other outcomes included all-cause hospitalization. We benchmarked our results to estimates from the European First Episode Schizophrenia Trial, a randomized trial conducted in the 2000s. We included 1097 patients with a psychotic disorder and less than 2 years since psychosis onset. Inverse-probability weighting was used to control for confounding. The estimated 12-month risks of discontinuation for aripiprazole, first-generation agents, olanzapine, paliperidone, quetiapine, and risperidone were 61.5% (95% CI, 52.5-70.6), 73.5% (95% CI, 60.5-84.9), 76.8% (95% CI, 67.2-85.3), 58.4% (95% CI, 40.4-77.4), 76.5% (95% CI, 62.1-88.5), and 74.4% (95% CI, 67.0-81.2), respectively. Compared with aripiprazole, the 12-month risk differences were -15.3% (95% CI, -30.0 to 0.0) for olanzapine, -12.8% (95% CI, -25.7 to -1.0) for risperidone, and 3.0% (95% CI, -21.5 to 30.8) for paliperidone. The 12-month risks of hospitalization were similar between agents. Our estimates support use of aripiprazole and paliperidone as first-line therapies for FEP. Benchmarking yielded similar results for discontinuation and absolute risks of hospitalization as in the original trial, suggesting that data from the FEP-CAUSAL Collaboration sufficed to remove confounding for these clinical questions. This article is part of a Special Collection on Mental Health.

Assuntos

Antipsicóticos , Transtornos Psicóticos , Humanos , Antipsicóticos/uso terapêutico , Feminino , Masculino , Transtornos Psicóticos/tratamento farmacológico , Adulto , Aripiprazol/uso terapêutico , Risperidona/uso terapêutico , Adulto Jovem , Hospitalização/estatística & dados numéricos , Olanzapina/uso terapêutico , Esquizofrenia/tratamento farmacológico , Adesão à Medicação/estatística & dados numéricos , Adolescente , Fumarato de Quetiapina/uso terapêutico

17.

Cardiorenal effects of Angiotensin-converting enzyme inhibitors and Angiotensin receptor blockers in people underrepresented in trials: analysis of routinely collected data with emulation of a reference trial (ONTARGET).

Baptiste, Paris J; Wong, Angel Y S; Schultze, Anna; Clase, Catherine M; Leyrat, Clémence; Williamson, Elizabeth; Powell, Emma; Mann, Johannes F E; Cunnington, Marianne; Teo, Koon; Bangdiwala, Shrikant I; Gao, Peggy; Tomlinson, Laurie; Wing, Kevin.

Am J Epidemiol ; 2024 Jun 18.

Artigo em Inglês | MEDLINE | ID: mdl-38896054

RESUMO

Cardiovascular disease (CVD) is a leading cause of death globally. Angiotensin-converting enzyme inhibitors (ACEi) and angiotensin receptor blockers (ARB), compared in the ONTARGET trial, each prevent CVD. However, trial results may not be generalisable and their effectiveness in underrepresented groups is unclear. Using trial emulation methods within routine-care data to validate findings, we explored generalisability of ONTARGET results. For people prescribed an ACEi/ARB in the UK Clinical Practice Research Datalink GOLD from 1/1/2001-31/7/2019, we applied trial criteria and propensity-score methods to create an ONTARGET trial-eligible cohort. Comparing ARB to ACEi, we estimated hazard ratios for the primary composite trial outcome (cardiovascular death, myocardial infarction, stroke, or hospitalisation for heart failure), and secondary outcomes. As the pre-specified criteria were met confirming trial emulation, we then explored treatment heterogeneity among three trial-underrepresented subgroups: females, those aged ≥75 years and those with chronic kidney disease (CKD). In the trial-eligible population (n=137,155), results for the primary outcome demonstrated similar effects of ARB and ACEi, (HR 0.97 [95% CI: 0.93, 1.01]), meeting the pre-specified validation criteria. When extending this outcome to trial-underrepresented groups, similar treatment effects were observed by sex, age and CKD. This suggests that ONTARGET trial findings are generalisable to trial-underrepresented subgroups.

18.

Brain-age prediction: Systematic evaluation of site effects, and sample age range and size.

Yu, Yuetong; Cui, Hao-Qi; Haas, Shalaila S; New, Faye; Sanford, Nicole; Yu, Kevin; Zhan, Denghuang; Yang, Guoyuan; Gao, Jia-Hong; Wei, Dongtao; Qiu, Jiang; Banaj, Nerisa; Boomsma, Dorret I; Breier, Alan; Brodaty, Henry; Buckner, Randy L; Buitelaar, Jan K; Cannon, Dara M; Caseras, Xavier; Clark, Vincent P; Conrod, Patricia J; Crivello, Fabrice; Crone, Eveline A; Dannlowski, Udo; Davey, Christopher G; de Haan, Lieuwe; de Zubicaray, Greig I; Di Giorgio, Annabella; Fisch, Lukas; Fisher, Simon E; Franke, Barbara; Glahn, David C; Grotegerd, Dominik; Gruber, Oliver; Gur, Raquel E; Gur, Ruben C; Hahn, Tim; Harrison, Ben J; Hatton, Sean; Hickie, Ian B; Hulshoff Pol, Hilleke E; Jamieson, Alec J; Jernigan, Terry L; Jiang, Jiyang; Kalnin, Andrew J; Kang, Sim; Kochan, Nicole A; Kraus, Anna; Lagopoulos, Jim; Lazaro, Luisa.

Hum Brain Mapp ; 45(10): e26768, 2024 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-38949537

RESUMO

Structural neuroimaging data have been used to compute an estimate of the biological age of the brain (brain-age) which has been associated with other biologically and behaviorally meaningful measures of brain development and aging. The ongoing research interest in brain-age has highlighted the need for robust and publicly available brain-age models pre-trained on data from large samples of healthy individuals. To address this need we have previously released a developmental brain-age model. Here we expand this work to develop, empirically validate, and disseminate a pre-trained brain-age model to cover most of the human lifespan. To achieve this, we selected the best-performing model after systematically examining the impact of seven site harmonization strategies, age range, and sample size on brain-age prediction in a discovery sample of brain morphometric measures from 35,683 healthy individuals (age range: 5-90 years; 53.59% female). The pre-trained models were tested for cross-dataset generalizability in an independent sample comprising 2101 healthy individuals (age range: 8-80 years; 55.35% female) and for longitudinal consistency in a further sample comprising 377 healthy individuals (age range: 9-25 years; 49.87% female). This empirical examination yielded the following findings: (1) the accuracy of age prediction from morphometry data was higher when no site harmonization was applied; (2) dividing the discovery sample into two age-bins (5-40 and 40-90 years) provided a better balance between model accuracy and explained age variance than other alternatives; (3) model accuracy for brain-age prediction plateaued at a sample size exceeding 1600 participants. These findings have been incorporated into CentileBrain (https://centilebrain.org/#/brainAGE2), an open-science, web-based platform for individualized neuroimaging metrics.

Assuntos

Envelhecimento , Encéfalo , Imageamento por Ressonância Magnética , Humanos , Adolescente , Feminino , Idoso , Adulto , Criança , Adulto Jovem , Masculino , Encéfalo/diagnóstico por imagem , Encéfalo/anatomia & histologia , Encéfalo/crescimento & desenvolvimento , Idoso de 80 Anos ou mais , Pré-Escolar , Pessoa de Meia-Idade , Envelhecimento/fisiologia , Imageamento por Ressonância Magnética/métodos , Neuroimagem/métodos , Neuroimagem/normas , Tamanho da Amostra

19.

How do spin-scaled double hybrids designed for excitation energies perform for noncovalent excited-state interactions? An investigation on aromatic excimer models.

Hancock, Amy C; Giudici, Erica; Goerigk, Lars.

J Comput Chem ; 45(19): 1667-1681, 2024 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-38553847

RESUMO

Time-dependent double hybrids with spin-component or spin-opposite scaling to their second-order perturbative correlation correction have demonstrated competitive robustness in the computation of electronic excitation energies. Some of the most robust are those recently published by our group (M. Casanova-Páez, L. Goerigk, J. Chem. Theory Comput. 2021, 20, 5165). So far, the implementation of these functionals has not allowed correctly calculating their ground-state total energies. Herein, we define their correct spin-scaled ground-state energy expressions which enables us to test our methods on the noncovalent excited-state interaction energies of four aromatic excimers. A range of 22 double hybrids with and without spin scaling are compared to the reasonably accurate wavefunction reference from our previous work (A. C. Hancock, L. Goerigk, RSC Adv. 2023, 13, 35964). The impact of spin scaling is highly dependent on the underlying functional expression, however, the smallest overall errors belong to spin-scaled functionals with range separation: SCS- and SOS- ω PBEPP86, and SCS-RSX-QIDH. We additionally determine parameters for DFT-D3(BJ)/D4 ground-state dispersion corrections of these functionals, which reduce errors in most cases. We highlight the necessity of dispersion corrections for even the most robust TD-DFT methods but also point out that ground-state based corrections are insufficient to completely capture dispersion effects for excited-state interaction energies.

20.

Phylogenomic data exploration with increased sampling provides new insights into the higher-level relationships of butterflies and moths (Lepidoptera).

Chen, Qi; Deng, Min; Dai, Xuan; Wang, Wei; Wang, Xing; Chen, Liu-Sheng; Huang, Guo-Hua.

Mol Phylogenet Evol ; 197: 108113, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38796071

RESUMO

A robust and stable phylogenetic framework is a fundamental goal of evolutionary biology. As the third largest insect order in the world following Coleoptera and Diptera, Lepidoptera (butterflies and moths) play a central role in almost every terrestrial ecosystem as indicators of environmental change and serve as important models for biologists exploring questions related to ecology and evolutionary biology. However, for such a charismatic insect group, the higher-level phylogenetic relationships among its superfamilies are still poorly resolved. Compared to earlier phylogenomic studies, we increased taxon sampling among Lepidoptera (37 superfamilies and 68 families containing 263 taxa) and acquired a series of large amino-acid datasets from 69,680 to 400,330 for phylogenomic reconstructions. Using these datasets, we explored the effect of different taxon sampling with significant increases in the number of included genes on tree topology by considering a series of systematic errors using maximum-likelihood (ML) and Bayesian inference (BI) methods. Moreover, we also tested the effectiveness in topology robustness among the three ML-based models. The results showed that taxon sampling is an important determinant in tree robustness of accurate lepidopteran phylogenetic estimation. Long-branch attraction (LBA) caused by site-wise heterogeneity is a significant source of bias giving rise to unstable positions of ditrysian groups in phylogenomic reconstruction. Phylogenetic inference showed the most comprehensive framework to reveal the relationships among lepidopteran superfamilies, and presented some newly relationships with strong supports (Papilionoidea was sister to Gelechioidea and Immoidea was sister to Galacticoidea, respectively), but limited by taxon sampling, the relationships within the species-rich and relatively rapid radiation Ditrysia and especially Apoditrysia remain poorly resolved, which need to increase taxon sampling for further phylogenomic reconstruction. The present study demonstrates that taxon sampling is an important determinant for an accurate lepidopteran tree of life and provides some essential insights for future lepidopteran phylogenomic studies.

Assuntos

Teorema de Bayes , Borboletas , Mariposas , Filogenia , Animais , Mariposas/genética , Mariposas/classificação , Funções Verossimilhança , Borboletas/genética , Borboletas/classificação , Modelos Genéticos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa