Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 94
Filtrar
1.
Gut Pathog ; 16(1): 27, 2024 May 12.
Artigo em Inglês | MEDLINE | ID: mdl-38735967

RESUMO

BACKGROUND: Enhancing our understanding of the underlying influences of medical interventions on the microbiome, resistome and mycobiome of preterm born infants holds significant potential for advancing infection prevention and treatment strategies. We conducted a prospective quasi-intervention study to better understand how antibiotics, and probiotics, and other medical factors influence the gut development of preterm infants. A controlled neonatal mice model was conducted in parallel, designed to closely reflect and predict exposures. Preterm infants and neonatal mice were stratified into four groups: antibiotics only, probiotics only, antibiotics followed by probiotics, and none of these interventions. Stool samples from both preterm infants and neonatal mice were collected at varying time points and analyzed by 16 S rRNA amplicon sequencing, ITS amplicon sequencing and whole genome shotgun sequencing. RESULTS: The human infant microbiomes showed an unexpectedly high degree of heterogeneity. Little impact from medical exposure (antibiotics/probiotics) was observed on the strain patterns, however, Bifidobacterium bifidum was found more abundant after exposure to probiotics, regardless of prior antibiotic administration. Twenty-seven antibiotic resistant genes were identified in the resistome. High intra-variability was evident within the different treatment groups. Lastly, we found significant effects of antibiotics and probiotics on the mycobiome but not on the microbiome and resistome of preterm infants. CONCLUSIONS: Although our analyses showed transient effects, these results provide positive motivation to continue the research on the effects of medical interventions on the microbiome, resistome and mycobiome of preterm infants.

2.
NPJ Digit Med ; 7(1): 139, 2024 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-38789620

RESUMO

The 2019 German Digital Healthcare Act introduced the Digital Health Application program, known in German as 'Digitale Gesundheitsanwendungen' (DiGA). The program has established a pioneering model for integrating Digital Therapeutics (DTx) into a healthcare system with scalable and effective reimbursement strategies. To date, the continuous upward trend enabled by this framework has resulted in more than 374,000 DiGA prescriptions, increasingly cementing its role in the German healthcare system. This perspective provides a synthesis of the DiGA program's evolution since its inception three years ago, highlighting trends regarding prescriptions and pricing as well as criticisms and identified shortcomings. It further discusses forthcoming legislative amendments, including the anticipated integration of higher-risk medical devices, which have the potential to significantly transform the program. Despite encountering challenges related to effectiveness, evidence requirements, and integration within the healthcare system, the DiGA program continues to evolve and serves as a seminal example for the integration of DTx, offering valuable insights for healthcare systems globally.

3.
EBioMedicine ; 104: 105171, 2024 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-38810562

RESUMO

BACKGROUND: The increasing volume and intricacy of sequencing data, along with other clinical and diagnostic data, like drug responses and measurable residual disease, creates challenges for efficient clinical comprehension and interpretation. Using paediatric B-cell precursor acute lymphoblastic leukaemia (BCP-ALL) as a use case, we present an artificial intelligence (AI)-assisted clinical framework clinALL that integrates genomic and clinical data into a user-friendly interface to support routine diagnostics and reveal translational insights for hematologic neoplasia. METHODS: We performed targeted RNA sequencing in 1365 cases with haematological neoplasms, primarily paediatric B-cell precursor acute lymphoblastic leukaemia (BCP-ALL) from the AIEOP-BFM ALL study. We carried out fluorescence in situ hybridization (FISH), karyotyping and arrayCGH as part of the routine diagnostics. The analysis results of these assays as well as additional clinical information were integrated into an interactive web interface using Bokeh, where the main graph is based on Uniform Manifold Approximation and Projection (UMAP) analysis of the gene expression data. At the backend of the clinALL, we built both shallow machine learning models and a deep neural network using Scikit-learn and PyTorch respectively. FINDINGS: By applying clinALL, 78% of undetermined patients under the current diagnostic protocol were stratified, and ambiguous cases were investigated. Translational insights were discovered, including IKZF1plus status dependent subpopulations of BCR::ABL1 positive patients, and a subpopulation within ETV6::RUNX1 positive patients that has a high relapse frequency. Our best machine learning models, LDA and PASNET-like neural network models, achieve F1 scores above 97% in predicting patients' subgroups. INTERPRETATION: An AI-assisted clinical framework that integrates both genomic and clinical data can take full advantage of the available data, improve point-of-care decision-making and reveal clinically relevant insights promptly. Such a lightweight and easily transferable framework works for both whole transcriptome data as well as the cost-effective targeted RNA-seq, enabling efficient and equitable delivery of personalized medicine in small clinics in developing countries. FUNDING: German Ministry of Education and Research (BMBF), German Research Foundation (DFG) and Foundation for Polish Science.

4.
mSystems ; 9(3): e0094523, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38376263

RESUMO

Bacterial plasmids play a major role in the spread of antibiotic resistance genes. However, their characterization via DNA sequencing suffers from the low abundance of plasmid DNA in those samples. Although sample preparation methods can enrich the proportion of plasmid DNA before sequencing, these methods are expensive and laborious, and they might introduce a bias by enriching only for specific plasmid DNA sequences. Nanopore adaptive sampling could overcome these issues by rejecting uninteresting DNA molecules during the sequencing process. In this study, we assess the application of adaptive sampling for the enrichment of low-abundant plasmids in known bacterial isolates using two different adaptive sampling tools. We show that a significant enrichment can be achieved even on expired flow cells. By applying adaptive sampling, we also improve the quality of de novo plasmid assemblies and reduce the sequencing time. However, our experiments also highlight issues with adaptive sampling if target and non-target sequences span similar regions. IMPORTANCE: Antimicrobial resistance causes millions of deaths every year. Mobile genetic elements like bacterial plasmids are key drivers for the dissemination of antimicrobial resistance genes. This makes the characterization of plasmids via DNA sequencing an important tool for clinical microbiologists. Since plasmids are often underrepresented in bacterial samples, plasmid sequencing can be challenging and laborious. To accelerate the sequencing process, we evaluate nanopore adaptive sampling as an in silico method for the enrichment of low-abundant plasmids. Our results show the potential of this cost-efficient method for future plasmid research but also indicate issues that arise from using reference sequences.


Assuntos
Anti-Infecciosos , Nanoporos , Plasmídeos/genética , Bactérias/genética , DNA
5.
PeerJ Comput Sci ; 9: e1291, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37346513

RESUMO

The detection of communities in graph datasets provides insight about a graph's underlying structure and is an important tool for various domains such as social sciences, marketing, traffic forecast, and drug discovery. While most existing algorithms provide fast approaches for community detection, their results usually contain strictly separated communities. However, most datasets would semantically allow for or even require overlapping communities that can only be determined at much higher computational cost. We build on an efficient algorithm, Fox, that detects such overlapping communities. Fox measures the closeness of a node to a community by approximating the count of triangles which that node forms with that community. We propose LazyFox, a multi-threaded adaptation of the Fox algorithm, which provides even faster detection without an impact on community quality. This allows for the analyses of significantly larger and more complex datasets. LazyFox enables overlapping community detection on complex graph datasets with millions of nodes and billions of edges in days instead of weeks. As part of this work, LazyFox's implementation was published and is available as a tool under an MIT licence at https://github.com/TimGarrels/LazyFox.

6.
Nucleic Acids Res ; 51(W1): W331-W337, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37167010

RESUMO

The mpox virus (MPXV) is mutating at an exceptional rate for a DNA virus and its global spread is concerning, making genomic surveillance a necessity. With MpoxRadar, we provide an interactive dashboard to track virus variants on mutation level worldwide. MpoxRadar allows users to select among different genomes as reference for comparison. The occurrence of mutation profiles based on the selected reference is indicated on an interactive world map that shows the respective geographic sampling site in customizable time ranges to easily follow the frequency or trend of defined mutations. Furthermore, the user can filter for specific mutations, genes, countries, genome types, and sequencing protocols and download the filtered data directly from MpoxRadar. On the server, we automatically download all MPXV genomes and metadata from the National Center for Biotechnology Information (NCBI) on a daily basis, align them to the different reference genomes, generate mutation profiles, which are stored and linked to the available metainformation in a database. This makes MpoxRadar a practical tool for the genomic survaillance of MPXV, supporting users with limited computational resources. MpoxRadar is open-source and freely accessible at https://MpoxRadar.net.


Assuntos
Genoma Viral , Genômica , Monkeypox virus , Software , Bases de Dados Factuais , Monkeypox virus/genética
7.
Mol Cell Proteomics ; 22(3): 100509, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36791992

RESUMO

Lysosomes, the main degradative organelles of mammalian cells, play a key role in the regulation of metabolism. It is becoming more and more apparent that they are highly active, diverse, and involved in a large variety of processes. The essential role of lysosomes is exemplified by the detrimental consequences of their malfunction, which can result in lysosomal storage disorders, neurodegenerative diseases, and cancer. Using lysosome enrichment and mass spectrometry, we investigated the lysosomal proteomes of HEK293, HeLa, HuH-7, SH-SY5Y, MEF, and NIH3T3 cells. We provide evidence on a large scale for cell type-specific differences of lysosomes, showing that levels of distinct lysosomal proteins are highly variable within one cell type, while expression of others is highly conserved across several cell lines. Using differentially stable isotope-labeled cells and bimodal distribution analysis, we furthermore identify a high confidence population of lysosomal proteins for each cell line. Multi-cell line correlation of these data reveals potential novel lysosomal proteins, and we confirm lysosomal localization for six candidates. All data are available via ProteomeXchange with identifier PXD020600.


Assuntos
Neuroblastoma , Proteoma , Camundongos , Animais , Humanos , Proteoma/metabolismo , Células HEK293 , Células NIH 3T3 , Neuroblastoma/metabolismo , Lisossomos/metabolismo , Mamíferos/metabolismo
8.
Lancet Digit Health ; 5(2): e93-e101, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36707190

RESUMO

Substantial opportunities for global health intelligence and research arise from the combined and optimised use of secondary data within data ecosystems. Secondary data are information being used for purposes other than those intended when they were collected. These data can be gathered from sources on the verge of widespread use such as the internet, wearables, mobile phone apps, electronic health records, or genome sequencing. To utilise their full potential, we offer guidance by outlining available sources and approaches for the processing of secondary data. Furthermore, in addition to indicators for the regulatory and ethical evaluation of strategies for the best use of secondary data, we also propose criteria for assessing reusability. This overview supports more precise and effective policy decision making leading to earlier detection and better prevention of emerging health threats than is currently the case.


Assuntos
Telefone Celular , Aplicativos Móveis , Ecossistema , Saúde Global , Internet
9.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36545804

RESUMO

Monoclonal antibodies are biotechnologically produced proteins with various applications in research, therapeutics and diagnostics. Their ability to recognize and bind to specific molecule structures makes them essential research tools and therapeutic agents. Sequence information of antibodies is helpful for understanding antibody-antigen interactions and ensuring their affinity and specificity. De novo protein sequencing based on mass spectrometry is a valuable method to obtain the amino acid sequence of peptides and proteins without a priori knowledge. In this study, we evaluated six recently developed de novo peptide sequencing algorithms (Novor, pNovo 3, DeepNovo, SMSNet, PointNovo and Casanovo), which were not specifically designed for antibody data. We validated their ability to identify and assemble antibody sequences on three multi-enzymatic data sets. The deep learning-based tools Casanovo and PointNovo showed an increased peptide recall across different enzymes and data sets compared with spectrum-graph-based approaches. We evaluated different error types of de novo peptide sequencing tools and their performance for different numbers of missing cleavage sites, noisy spectra and peptides of various lengths. We achieved a sequence coverage of 97.69-99.53% on the light chains of three different antibody data sets using the de Bruijn assembler ALPS and the predictions from Casanovo. However, low sequence coverage and accuracy on the heavy chains demonstrate that complete de novo protein sequencing remains a challenging issue in proteomics that requires improved de novo error correction, alternative digestion strategies and hybrid approaches such as homology search to achieve high accuracy on long protein sequences.


Assuntos
Anticorpos Monoclonais , Peptídeos , Sequência de Aminoácidos , Anticorpos Monoclonais/genética , Peptídeos/genética , Peptídeos/química , Algoritmos , Análise de Sequência de Proteína/métodos
10.
Bioinformatics ; 38(Suppl_2): ii113-ii119, 2022 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-36124784

RESUMO

MOTIVATION: While it has been well established that drugs affect and help patients differently, personalized drug response predictions remain challenging. Solutions based on single omics measurements have been proposed, and networks provide means to incorporate molecular interactions into reasoning. However, how to integrate the wealth of information contained in multiple omics layers still poses a complex problem. RESULTS: We present DrDimont, Drug response prediction from Differential analysis of multi-omics networks. It allows for comparative conclusions between two conditions and translates them into differential drug response predictions. DrDimont focuses on molecular interactions. It establishes condition-specific networks from correlation within an omics layer that are then reduced and combined into heterogeneous, multi-omics molecular networks. A novel semi-local, path-based integration step ensures integrative conclusions. Differential predictions are derived from comparing the condition-specific integrated networks. DrDimont's predictions are explainable, i.e. molecular differences that are the source of high differential drug scores can be retrieved. We predict differential drug response in breast cancer using transcriptomics, proteomics, phosphosite and metabolomics measurements and contrast estrogen receptor positive and receptor negative patients. DrDimont performs better than drug prediction based on differential protein expression or PageRank when evaluating it on ground truth data from cancer cell lines. We find proteomic and phosphosite layers to carry most information for distinguishing drug response. AVAILABILITY AND IMPLEMENTATION: DrDimont is available on CRAN: https://cran.r-project.org/package=DrDimont. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias da Mama , Software , Neoplasias da Mama/tratamento farmacológico , Feminino , Humanos , Proteômica , Receptores de Estrogênio , Transcriptoma
11.
Bioinformatics ; 38(Suppl_2): ii168-ii174, 2022 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-36124807

RESUMO

BACKGROUND: Emerging pathogens are a growing threat, but large data collections and approaches for predicting the risk associated with novel agents are limited to bacteria and viruses. Pathogenic fungi, which also pose a constant threat to public health, remain understudied. Relevant data remain comparatively scarce and scattered among many different sources, hindering the development of sequencing-based detection workflows for novel fungal pathogens. No prediction method working for agents across all three groups is available, even though the cause of an infection is often difficult to identify from symptoms alone. RESULTS: We present a curated collection of fungal host range data, comprising records on human, animal and plant pathogens, as well as other plant-associated fungi, linked to publicly available genomes. We show that it can be used to predict the pathogenic potential of novel fungal species directly from DNA sequences with either sequence homology or deep learning. We develop learned, numerical representations of the collected genomes and visualize the landscape of fungal pathogenicity. Finally, we train multi-class models predicting if next-generation sequencing reads originate from novel fungal, bacterial or viral threats. CONCLUSIONS: The neural networks trained using our data collection enable accurate detection of novel fungal pathogens. A curated set of over 1400 genomes with host and pathogenicity metadata supports training of machine-learning models and sequence comparison, not limited to the pathogen detection task. AVAILABILITY AND IMPLEMENTATION: The data, models and code are hosted at https://zenodo.org/record/5846345, https://zenodo.org/record/5711877 and https://gitlab.com/dacs-hpi/deepac. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
DNA , Fungos , Animais , Bactérias/genética , Coleta de Dados , Fungos/genética , Humanos , Aprendizado de Máquina , Redes Neurais de Computação
12.
Life (Basel) ; 12(9)2022 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-36143382

RESUMO

Over the past years, NGS has become a crucial workhorse for open-view pathogen diagnostics. Yet, long turnaround times result from using massively parallel high-throughput technologies as the analysis can only be performed after sequencing has finished. The interpretation of results can further be challenged by contaminations, clinically irrelevant sequences, and the sheer amount and complexity of the data. We implemented PathoLive, a real-time diagnostics pipeline for the detection of pathogens from clinical samples hours before sequencing has finished. Based on real-time alignment with HiLive2, mappings are scored with respect to common contaminations, low-entropy areas, and sequences of widespread, non-pathogenic organisms. The results are visualized using an interactive taxonomic tree that provides an easily interpretable overview of the relevance of hits. For a human plasma sample that was spiked in vitro with six pathogenic viruses, all agents were clearly detected after only 40 of 200 sequencing cycles. For a real-world sample from Sudan, the results correctly indicated the presence of Crimean-Congo hemorrhagic fever virus. In a second real-world dataset from the 2019 SARS-CoV-2 outbreak in Wuhan, we found the presence of a SARS coronavirus as the most relevant hit without the novel virus reference genome being included in the database. For all samples, clinically irrelevant hits were correctly de-emphasized. Our approach is valuable to obtain fast and accurate NGS-based pathogen identifications and correctly prioritize and visualize them based on their clinical significance: PathoLive is open source and available on GitLab and BioConda.

13.
Bioinformatics ; 38(17): 4223-4225, 2022 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-35799354

RESUMO

SUMMARY: The ongoing pandemic caused by SARS-CoV-2 emphasizes the importance of genomic surveillance to understand the evolution of the virus, to monitor the viral population, and plan epidemiological responses. Detailed analysis, easy visualization and intuitive filtering of the latest viral sequences are powerful for this purpose. We present CovRadar, a tool for genomic surveillance of the SARS-CoV-2 Spike protein. CovRadar consists of an analytical pipeline and a web application that enable the analysis and visualization of hundreds of thousand sequences. First, CovRadar extracts the regions of interest using local alignment, then builds a multiple sequence alignment, infers variants and consensus and finally presents the results in an interactive app, making accessing and reporting simple, flexible and fast. AVAILABILITY AND IMPLEMENTATION: CovRadar is freely accessible at https://covradar.net, its open-source code is available at https://gitlab.com/dacs-hpi/covradar. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Genômica , Mutação
14.
Bioinformatics ; 38(Suppl 1): i153-i160, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758774

RESUMO

MOTIVATION: Nanopore sequencers allow targeted sequencing of interesting nucleotide sequences by rejecting other sequences from individual pores. This feature facilitates the enrichment of low-abundant sequences by depleting overrepresented ones in-silico. Existing tools for adaptive sampling either apply signal alignment, which cannot handle human-sized reference sequences, or apply read mapping in sequence space relying on fast graphical processing units (GPU) base callers for real-time read rejection. Using nanopore long-read mapping tools is also not optimal when mapping shorter reads as usually analyzed in adaptive sampling applications. RESULTS: Here, we present a new approach for nanopore adaptive sampling that combines fast CPU and GPU base calling with read classification based on Interleaved Bloom Filters. ReadBouncer improves the potential enrichment of low abundance sequences by its high read classification sensitivity and specificity, outperforming existing tools in the field. It robustly removes even reads belonging to large reference sequences while running on commodity hardware without GPUs, making adaptive sampling accessible for in-field researchers. Readbouncer also provides a user-friendly interface and installer files for end-users without a bioinformatics background. AVAILABILITY AND IMPLEMENTATION: The C++ source code is available at https://gitlab.com/dacs-hpi/readbouncer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA , Software
15.
J Proteome Res ; 21(4): 899-909, 2022 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-35086334

RESUMO

In liquid-chromatography-tandem-mass-spectrometry-based proteomics, information about the presence and stoichiometry of protein modifications is not readily available. To overcome this problem, we developed multiFLEX-LF, a computational tool that builds upon FLEXIQuant, which detects modified peptide precursors and quantifies their modification extent by monitoring the differences between observed and expected intensities of the unmodified precursors. multiFLEX-LF relies on robust linear regression to calculate the modification extent of a given precursor relative to a within-study reference. multiFLEX-LF can analyze entire label-free discovery proteomics data sets in a precursor-centric manner without preselecting a protein of interest. To analyze modification dynamics and coregulated modifications, we hierarchically clustered the precursors of all proteins based on their computed relative modification scores. We applied multiFLEX-LF to a data-independent-acquisition-based data set acquired using the anaphase-promoting complex/cyclosome (APC/C) isolated at various time points during mitosis. The clustering of the precursors allows for identifying varying modification dynamics and ordering the modification events. Overall, multiFLEX-LF enables the fast identification of potentially differentially modified peptide precursors and the quantification of their differential modification extent in large data sets using a personal computer. Additionally, multiFLEX-LF can drive the large-scale investigation of the modification dynamics of peptide precursors in time-series and case-control studies. multiFLEX-LF is available at https://gitlab.com/SteenOmicsLab/multiflex-lf.


Assuntos
Proteínas , Proteômica , Cromatografia Líquida , Espectrometria de Massas , Peptídeos
16.
Gigascience ; 122022 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-36994872

RESUMO

BACKGROUND: Contamination detection is a important step that should be carefully considered in early stages when designing and performing microbiome studies to avoid biased outcomes. Detecting and removing true contaminants is challenging, especially in low-biomass samples or in studies lacking proper controls. Interactive visualizations and analysis platforms are crucial to better guide this step, to help to identify and detect noisy patterns that could potentially be contamination. Additionally, external evidence, like aggregation of several contamination detection methods and the use of common contaminants reported in the literature, could help to discover and mitigate contamination. RESULTS: We propose GRIMER, a tool that performs automated analyses and generates a portable and interactive dashboard integrating annotation, taxonomy, and metadata. It unifies several sources of evidence to help detect contamination. GRIMER is independent of quantification methods and directly analyzes contingency tables to create an interactive and offline report. Reports can be created in seconds and are accessible for nonspecialists, providing an intuitive set of charts to explore data distribution among observations and samples and its connections with external sources. Further, we compiled and used an extensive list of possible external contaminant taxa and common contaminants with 210 genera and 627 species reported in 22 published articles. CONCLUSION: GRIMER enables visual data exploration and analysis, supporting contamination detection in microbiome studies. The tool and data presented are open source and available at https://gitlab.com/dacs-hpi/grimer.


Assuntos
Microbiota , Biomassa , Metadados
17.
Nat Commun ; 12(1): 7305, 2021 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-34911965

RESUMO

Metaproteomics has matured into a powerful tool to assess functional interactions in microbial communities. While many metaproteomic workflows are available, the impact of method choice on results remains unclear. Here, we carry out a community-driven, multi-laboratory comparison in metaproteomics: the critical assessment of metaproteome investigation study (CAMPI). Based on well-established workflows, we evaluate the effect of sample preparation, mass spectrometry, and bioinformatic analysis using two samples: a simplified, laboratory-assembled human intestinal model and a human fecal sample. We observe that variability at the peptide level is predominantly due to sample processing workflows, with a smaller contribution of bioinformatic pipelines. These peptide-level differences largely disappear at the protein group level. While differences are observed for predicted community composition, similar functional profiles are obtained across workflows. CAMPI demonstrates the robustness of present-day metaproteomics research, serves as a template for multi-laboratory studies in metaproteomics, and provides publicly available data sets for benchmarking future developments.


Assuntos
Bactérias/genética , Proteínas de Bactérias/química , Fezes/microbiologia , Proteômica/métodos , Adulto , Bactérias/classificação , Bactérias/isolamento & purificação , Proteínas de Bactérias/genética , Feminino , Microbioma Gastrointestinal , Humanos , Intestinos/microbiologia , Laboratórios , Espectrometria de Massas , Peptídeos/química , Fluxo de Trabalho
18.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34297793

RESUMO

Novel pathogens evolve quickly and may emerge rapidly, causing dangerous outbreaks or even global pandemics. Next-generation sequencing is the state of the art in open-view pathogen detection, and one of the few methods available at the earliest stages of an epidemic, even when the biological threat is unknown. Analyzing the samples as the sequencer is running can greatly reduce the turnaround time, but existing tools rely on close matches to lists of known pathogens and perform poorly on novel species. Machine learning approaches can predict if single reads originate from more distant, unknown pathogens but require relatively long input sequences and processed data from a finished sequencing run. Incomplete sequences contain less information, leading to a trade-off between sequencing time and detection accuracy. Using a workflow for real-time pathogenic potential prediction, we investigate which subsequences already allow accurate inference. We train deep neural networks to classify Illumina and Nanopore reads and integrate the models with HiLive2, a real-time Illumina mapper. This approach outperforms alternatives based on machine learning and sequence alignment on simulated and real data, including SARS-CoV-2 sequencing runs. After just 50 Illumina cycles, we observe an 80-fold sensitivity increase compared to real-time mapping. The first 250 bp of Nanopore reads, corresponding to 0.5 s of sequencing time, are enough to yield predictions more accurate than mapping the finished long reads. The approach could also be used for screening synthetic sequences against biosecurity threats.


Assuntos
COVID-19/genética , SARS-CoV-2/isolamento & purificação , COVID-19/virologia , Aprendizado Profundo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Nanoporos , Redes Neurais de Computação , SARS-CoV-2/genética , SARS-CoV-2/patogenicidade , Alinhamento de Sequência
19.
J Proteome Res ; 20(4): 2083-2088, 2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33661648

RESUMO

The study of microbiomes has gained in importance over the past few years and has led to the emergence of the fields of metagenomics, metatranscriptomics, and metaproteomics. While initially focused on the study of biodiversity within these communities, the emphasis has increasingly shifted to the study of (changes in) the complete set of functions available in these communities. A key tool to study this functional complement of a microbiome is Gene Ontology (GO) term analysis. However, comparing large sets of GO terms is not an easy task due to the deeply branched nature of GO, which limits the utility of exact term matching. To solve this problem, we here present MegaGO, a user-friendly tool that relies on semantic similarity between GO terms to compute the functional similarity between multiple data sets. MegaGO is high performing: Each set can contain thousands of GO terms, and results are calculated in a matter of seconds. MegaGO is available as a web application at https://megago.ugent.be and is installable via pip as a standalone command line tool and reusable software library. All code is open source under the MIT license and is available at https://github.com/MEGA-GO/.


Assuntos
Microbiota , Software , Biologia Computacional , Ontologia Genética , Metagenômica , Semântica
20.
NAR Genom Bioinform ; 3(1): lqab004, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33554119

RESUMO

Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA