Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 122
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
Sci Rep ; 10(1): 6100, 2020 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-32269255

RESUMO

Previous studies of the association between parity and long-term cognitive changes have primarily focused on women and have shown conflicting results. We investigated this association by analyzing data collected on 303,196 subjects from the UK Biobank. We found that in both females and males, having offspring was associated with a faster response time and fewer mistakes made in the visual memory task. Subjects with two or three children had the largest differences relative to those who were childless, with greater effects observed in men. We further analyzed the association between parity and relative brain age (n = 13,584), a brain image-based biomarker indicating how old one's brain structure appears relative to peers. We found that in both sexes, subjects with two or three offspring had significantly reduced brain age compared to those without offspring, corroborating our cognitive function results. Our findings suggest that lifestyle factors accompanying having offspring, rather than the physical process of pregnancy experienced only by females, contribute to these associations and underscore the importance of studying such factors, particularly in the context of sex.

3.
Sci Rep ; 10(1): 10, 2020 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-32001736

RESUMO

Brain age is a metric that quantifies the degree of aging of a brain based on whole-brain anatomical characteristics. While associations between individual human brain regions and environmental or genetic factors have been investigated, how brain age is associated with those factors remains unclear. We investigated these associations using UK Biobank data. We first trained a statistical model for obtaining relative brain age (RBA), a metric describing a subject's brain age relative to peers, based on whole-brain anatomical measurements, from training set subjects (n = 5,193). We then applied this model to evaluation set subjects (n = 12,115) and tested the association of RBA with tobacco smoking, alcohol consumption, and genetic variants. We found that daily or almost daily consumption of tobacco and alcohol were both significantly associated with increased RBA (P < 0.001). We also found SNPs significantly associated with RBA (p-value < 5E-8). The SNP most significantly associated with RBA is located in MAPT gene. Our results suggest that both environmental and genetic factors are associated with structural brain aging.

4.
J Transl Med ; 18(1): 5, 2020 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-31906978

RESUMO

BACKGROUND: Sepsis remains a major challenge in intensive care units, causing unacceptably high mortality rates due to the lack of rapid diagnostic tools with sufficient sensitivity. Therefore, there is an urgent need to replace time-consuming blood cultures with a new method. Ideally, such a method also provides comprehensive profiling of pathogenic bacteria to facilitate the treatment decision. METHODS: We developed a Random Forest with balanced subsampling to screen for pathogenic bacteria and diagnose sepsis based on cell-free DNA (cfDNA) sequencing data in a small blood sample. In addition, we constructed a bacterial co-occurrence network, based on a set of normal and sepsis samples, to infer unobserved bacteria. RESULTS: Based solely on cfDNA sequencing information from three independent datasets of sepsis, we distinguish sepsis from healthy samples with a satisfactory performance. This strategy also provides comprehensive bacteria profiling, permitting doctors to choose the best treatment strategy for a sepsis case. CONCLUSIONS: The combination of sepsis identification and bacteria-inferring strategies is a success for noninvasive cfDNA-based diagnosis, which has the potential to greatly enhance efficiency in disease detection and provide a comprehensive understanding of pathogens. For comparison, where a culture-based analysis of pathogens takes up to 5 days and is effective for only a third to a half of patients, cfDNA sequencing can be completed in just 1 day and our method can identify the majority of pathogens in all patients.

5.
Genome Biol ; 20(1): 266, 2019 12 04.
Artigo em Inglês | MEDLINE | ID: mdl-31801606

RESUMO

Alignment-free methods, more time and memory efficient than alignment-based methods, have been widely used for comparing genome sequences or raw sequencing samples without assembly. However, in this study, we show that alignment-free dissimilarity calculated based on sequencing samples can be overestimated compared with the dissimilarity calculated based on their genomes, and this bias can significantly decrease the performance of the alignment-free analysis. Here, we introduce a new alignment-free tool, Alignment-Free methods Adjusted by Neural Network (Afann) that successfully adjusts this bias and achieves excellent performance on various independent datasets. Afann is freely available at https://github.com/GeniusTang/Afann.

6.
Front Genet ; 10: 1156, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31824565

RESUMO

Comparing metagenomic samples is a critical step in understanding the relationships among microbial communities. Recently, next-generation sequencing (NGS) technologies have produced a massive amount of short reads data for microbial communities from different environments. The assembly of these short reads can, however, be time-consuming and challenging. In addition, alignment-based methods for metagenome comparison are limited by incomplete genome and/or pathway databases. In contrast, alignment-free methods for metagenome comparison do not depend on the completeness of genome or pathway databases. Still, the existing alignment-free methods, d 2 S and d 2 * , which model k-tuple patterns using only one Markov chain for each sample, neglect the heterogeneity within metagenomic data wherein potentially thousands of types of microorganisms are sequenced. To address this imperfection in d 2 S and d 2 * , we organized NGS sequences into different reads bins and constructed several corresponding Markov models. Next, we modified the definition of our previous alignment-free methods, d 2 S and d 2 * , to make them more compatible with a scheme of analysis which uses the proposed reads bins. We then used two simulated and three real metagenomic datasets to test the effect of the k-tuple size and Markov orders of background sequences on the performance of these de novo alignment-free methods. For dependable comparison of metagenomic samples, our newly developed alignment-free methods with reads binning outperformed alignment-free methods without reads binning in detecting the relationship among microbial communities, including whether they form groups or change according to some environmental gradients.

7.
Genome Biol ; 20(1): 214, 2019 10 22.
Artigo em Inglês | MEDLINE | ID: mdl-31640754

RESUMO

Following publication of the original paper [1], Dr. Nayfach kindly pointed out an error and the authors would like to report the following correction.

8.
Genome Biol ; 20(1): 154, 2019 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-31387630

RESUMO

We develop a metagenomic data analysis pipeline, MicroPro, that takes into account all reads from known and unknown microbial organisms and associates viruses with complex diseases. We utilize MicroPro to analyze four metagenomic datasets relating to colorectal cancer, type 2 diabetes, and liver cirrhosis and show that including reads from unknown organisms significantly increases the prediction accuracy of the disease status for three of the four datasets. We identify new microbial organisms associated with these diseases and show viruses play important prediction roles in colorectal cancer and liver cirrhosis, but not in type 2 diabetes. MicroPro is freely available at https://github.com/zifanzhu/MicroPro .


Assuntos
Doença , Metagenômica/métodos , Microbiota/genética , Software , Fenômenos Fisiológicos Virais , Neoplasias Colorretais/virologia , Diabetes Mellitus Tipo 2/virologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Cirrose Hepática/virologia
9.
Genome Biol ; 20(1): 144, 2019 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-31345254

RESUMO

BACKGROUND: Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. RESULTS: Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events. CONCLUSION: The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions.


Assuntos
Análise de Sequência , Benchmarking , Transferência Genética Horizontal , Internet , Filogenia , Sequências Reguladoras de Ácido Nucleico , Alinhamento de Sequência , Análise de Sequência de Proteína , Software
10.
Nucleic Acids Res ; 47(W1): W379-W387, 2019 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-31106361

RESUMO

Automated function prediction (AFP) of proteins is of great significance in biology. AFP can be regarded as a problem of the large-scale multi-label classification where a protein can be associated with multiple gene ontology terms as its labels. Based on our GOLabeler-a state-of-the-art method for the third critical assessment of functional annotation (CAFA3), in this paper we propose NetGO, a web server that is able to further improve the performance of the large-scale AFP by incorporating massive protein-protein network information. Specifically, the advantages of NetGO are threefold in using network information: (i) NetGO relies on a powerful learning to rank framework from machine learning to effectively integrate both sequence and network information of proteins; (ii) NetGO uses the massive network information of all species (>2000) in STRING (other than only some specific species) and (iii) NetGO still can use network information to annotate a protein by homology transfer, even if it is not contained in STRING. Separating training and testing data with the same time-delayed settings of CAFA, we comprehensively examined the performance of NetGO. Experimental results have clearly demonstrated that NetGO significantly outperforms GOLabeler and other competing methods. The NetGO web server is freely available at http://issubmission.sjtu.edu.cn/netgo/.

11.
Bioinformatics ; 35(21): 4229-4238, 2019 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-30977806

RESUMO

MOTIVATION: Metagenomic contig binning is an important computational problem in metagenomic research, which aims to cluster contigs from the same genome into the same group. Unlike classical clustering problem, contig binning can utilize known relationships among some of the contigs or the taxonomic identity of some contigs. However, the current state-of-the-art contig binning methods do not make full use of the additional biological information except the coverage and sequence composition of the contigs. RESULTS: We developed a novel contig binning method, Semi-supervised Spectral Normalized Cut for Binning (SolidBin), based on semi-supervised spectral clustering. Using sequence feature similarity and/or additional biological information, such as the reliable taxonomy assignments of some contigs, SolidBin constructs two types of prior information: must-link and cannot-link constraints. Must-link constraints mean that the pair of contigs should be clustered into the same group, while cannot-link constraints mean that the pair of contigs should be clustered in different groups. These constraints are then integrated into a classical spectral clustering approach, normalized cut, for improved contig binning. The performance of SolidBin is compared with five state-of-the-art genome binners, CONCOCT, COCACOLA, MaxBin, MetaBAT and BMC3C on five next-generation sequencing benchmark datasets including simulated multi- and single-sample datasets and real multi-sample datasets. The experimental results show that, SolidBin has achieved the best performance in terms of F-score, Adjusted Rand Index and Normalized Mutual Information, especially while using the real datasets and the single-sample dataset. AVAILABILITY AND IMPLEMENTATION: https://github.com/sufforest/SolidBin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

12.
Bioinformatics ; 35(22): 4596-4606, 2019 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-30993316

RESUMO

MOTIVATION: Detecting sequences containing repetitive regions is a basic bioinformatics task with many applications. Several methods have been developed for various types of repeat detection tasks. An efficient generic method for detecting most types of repetitive sequences is still desirable. Inspired by the excellent properties and successful applications of the D2 family of statistics in comparative analyses of genomic sequences, we developed a new statistic D2R that can efficiently discriminate sequences with or without repetitive regions. RESULTS: Using the statistic, we developed an algorithm of linear time and space complexity for detecting most types of repetitive sequences in multiple scenarios, including finding candidate clustered regularly interspaced short palindromic repeats regions from bacterial genomic or metagenomics sequences. Simulation and real data experiments show that the method works well on both assembled sequences and unassembled short reads. AVAILABILITY AND IMPLEMENTATION: The codes are available at https://github.com/XuegongLab/D2R_codes under GPL 3.0 license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

13.
Brief Bioinform ; 2019 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-30860572

RESUMO

In metagenomic studies of microbial communities, the short reads come from mixtures of genomes. Read assembly is usually an essential first step for the follow-up studies in metagenomic research. Understanding the power and limitations of various read assembly programs in practice is important for researchers to choose which programs to use in their investigations. Many studies evaluating different assembly programs used either simulated metagenomes or real metagenomes with unknown genome compositions. However, the simulated datasets may not reflect the real complexities of metagenomic samples and the estimated assembly accuracy could be misleading due to the unknown genomes in real metagenomes. Therefore, hybrid strategies are required to evaluate the various read assemblers for metagenomic studies. In this paper, we benchmark the metagenomic read assemblers by mixing reads from real metagenomic datasets with reads from known genomes and evaluating the integrity, contiguity and accuracy of the assembly using the reads from the known genomes. We selected four advanced metagenome assemblers, MEGAHIT, MetaSPAdes, IDBA-UD and Faucet, for evaluation. We showed the strengths and weaknesses of these assemblers in terms of integrity, contiguity and accuracy for different variables, including the genetic difference of the real genomes with the genome sequences in the real metagenomic datasets and the sequencing depth of the simulated datasets. Overall, MetaSPAdes performs best in terms of integrity and continuity at the species-level, followed by MEGAHIT. Faucet performs best in terms of accuracy at the cost of worst integrity and continuity, especially at low sequencing depth. MEGAHIT has the highest genome fractions at the strain-level and MetaSPAdes has the overall best performance at the strain-level. MEGAHIT is the most efficient in our experiments. Availability: The source code is available at https://github.com/ziyewang/MetaAssemblyEval.

14.
BMC Bioinformatics ; 20(1): 53, 2019 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-30691412

RESUMO

BACKGROUND: Local similarity analysis (LSA) of time series data has been extensively used to investigate the dynamics of biological systems in a wide range of environments. Recently, a theoretical method was proposed to approximately calculate the statistical significance of local similarity (LS) scores. However, the method assumes that the time series data are independent identically distributed, which can be violated in many problems. RESULTS: In this paper, we develop a novel approach to accurately approximate statistical significance of LSA for dependent time series data using nonparametric kernel estimated long-run variance. We also investigate an alternative method for LSA statistical significance approximation by computing the local similarity score of the residuals based on a predefined statistical model. We show by simulations that both methods have controllable type I errors for dependent time series, while other approaches for statistical significance can be grossly oversized. We apply both methods to human and marine microbial datasets, where most of possible significant associations are captured and false positives are efficiently controlled. CONCLUSIONS: Our methods provide fast and effective approaches for evaluating statistical significance of dependent time series data with controllable type I error. They can be applied to a variety of time series data to reveal inherent relationships among the different factors.


Assuntos
Algoritmos , Modelos Estatísticos , Organismos Aquáticos/microbiologia , Bases de Dados como Assunto , Feminino , Humanos , Masculino , Microbiota , Fatores de Tempo
15.
BMC Genomics ; 19(1): 896, 2018 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-30526482

RESUMO

BACKGROUND: The application of genomic data and bioinformatics for the identification of restricted or illegally-sourced natural products is urgently needed. The taxonomic identity and geographic provenance of raw and processed materials have implications in sustainable-use commercial practices, and relevance to the enforcement of laws that regulate or restrict illegally harvested materials, such as timber. Improvements in genomics make it possible to capture and sequence partial-to-complete genomes from challenging tissues, such as wood and wood products. RESULTS: In this paper, we report the success of an alignment-free genome comparison method, [Formula: see text] that differentiates different geographic sources of white oak (Quercus) species with a high level of accuracy with very small amount of genomic data. The method is robust to sequencing errors, different sequencing laboratories and sequencing platforms. CONCLUSIONS: This method offers an approach based on genome-scale data, rather than panels of pre-selected markers for specific taxa. The method provides a generalizable platform for the identification and sourcing of materials using a unified next generation sequencing and analysis framework.


Assuntos
DNA de Plantas/genética , Genoma de Planta , Geografia , Quercus/genética , Alinhamento de Sequência/métodos , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Componente Principal
16.
Sci Rep ; 8(1): 10032, 2018 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-29968780

RESUMO

Predicting the hosts of newly discovered viruses is important for pandemic surveillance of infectious diseases. We investigated the use of alignment-based and alignment-free methods and support vector machine using mononucleotide frequency and dinucleotide bias to predict the hosts of viruses, and applied these approaches to three datasets: rabies virus, coronavirus, and influenza A virus. For coronavirus, we used the spike gene sequences, while for rabies and influenza A viruses, we used the more conserved nucleoprotein gene sequences. We compared the three methods under different scenarios and showed that their performances are highly correlated with the variability of sequences and sample size. For conserved genes like the nucleoprotein gene, longer k-mers than mono- and dinucleotides are needed to better distinguish the sequences. We also showed that both alignment-based and alignment-free methods can accurately predict the hosts of viruses. When alignment is difficult to achieve or highly time-consuming, alignment-free methods can be a promising substitute to predict the hosts of new viruses.


Assuntos
DNA Viral/análise , Interações entre Hospedeiro e Microrganismos/genética , Análise de Sequência de DNA/métodos , Coronavirus/genética , Genoma Viral/genética , Interações entre Hospedeiro e Microrganismos/fisiologia , Vírus da Influenza A/genética , Coronavírus da Síndrome Respiratória do Oriente Médio/genética , Modelos Teóricos , Pandemias , Filogenia , Vírus da Raiva/genética , Alinhamento de Sequência/métodos , Glicoproteína da Espícula de Coronavírus/genética , Máquina de Vetores de Suporte
17.
FEMS Microbiol Ecol ; 94(9)2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30010747

RESUMO

Ulcerative colitis is a chronic inflammatory disease of the colon that carries a significant disease burden in children. Therefore, new therapeutic approaches are being explored to help children living with this disease. Fecal microbiota transplantation (FMT) has been successful in some children with ulcerative colitis. However, the mechanism of its therapeutic effect in this patient population is not well understood. To characterize changes in gut microbial and metabolomic profiles after FMT, we performed 16S rRNA gene sequencing, shotgun metagenomic sequencing, virome analysis and untargeted metabolomics by gas chromatography-time of flight-mass spectrometry on stool samples collected before and after FMT from four children with ulcerative colitis who responded to this treatment. Alpha diversity of the gut microbiota increased after intervention, with species richness rising from 251 (S.D. 125) to 358 (S.D. 27). In responders, the mean relative abundance of bacteria in the class Clostridia shifted toward donor levels, increasing from 33% (S.D. 11%) to 54% (S.D. 16%). Patient metabolomic and viromic profiles exhibited a similar but less pronounced shift toward donor profiles after FMT. The fecal concentrations of several metabolites were altered after FMT, correlating with clinical improvement. Larger studies using a similar multi-omics approach may suggest novel strategies for the treatment of pediatric ulcerative colitis.


Assuntos
Clostridiaceae/isolamento & purificação , Colite Ulcerativa/microbiologia , Colite Ulcerativa/terapia , Transplante de Microbiota Fecal , Microbioma Gastrointestinal/fisiologia , Criança , Clostridiaceae/classificação , Clostridiaceae/genética , Fezes/microbiologia , Feminino , Humanos , Masculino , Metabolômica , Metagenômica , RNA Ribossômico 16S/genética
18.
Front Microbiol ; 9: 711, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29713314

RESUMO

Horizontal gene transfer (HGT) plays an important role in the evolution of microbial organisms including bacteria. Alignment-free methods based on single genome compositional information have been used to detect HGT. Currently, Manhattan and Euclidean distances based on tetranucleotide frequencies are the most commonly used alignment-free dissimilarity measures to detect HGT. By testing on simulated bacterial sequences and real data sets with known horizontal transferred genomic regions, we found that more advanced alignment-free dissimilarity measures such as CVTree and [Formula: see text] that take into account the background Markov sequences can solve HGT detection problems with significantly improved performance. We also studied the influence of different factors such as evolutionary distance between host and donor sequences, size of sliding window, and host genome composition on the performances of alignment-free methods to detect HGT. Our study showed that alignment-free methods can predict HGT accurately when host and donor genomes are in different order levels. Among all methods, CVTree with word length of 3, [Formula: see text] with word length 3, Markov order 1 and [Formula: see text] with word length 4, Markov order 1 outperform others in terms of their highest F1-score and their robustness under the influence of different factors.

19.
Neurobiol Aging ; 68: 151-158, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29784544

RESUMO

A long-standing question is how to best use brain morphometric and genetic data to distinguish Alzheimer's disease (AD) patients from cognitively normal (CN) subjects and to predict those who will progress from mild cognitive impairment (MCI) to AD. Here, we use a neural network (NN) framework on both magnetic resonance imaging-derived quantitative structural brain measures and genetic data to address this question. We tested the effectiveness of NN models in classifying and predicting AD. We further performed a novel analysis of the NN model to gain insight into the most predictive imaging and genetics features and to identify possible interactions between features that affect AD risk. Data were obtained from the AD Neuroimaging Initiative cohort and included baseline structural MRI data and single nucleotide polymorphism (SNP) data for 138 AD patients, 225 CN subjects, and 358 MCI patients. We found that NN models with both brain and SNP features as predictors perform significantly better than models with either alone in classifying AD and CN subjects, with an area under the receiver operating characteristic curve (AUC) of 0.992, and in predicting the progression from MCI to AD (AUC=0.835). The most important predictors in the NN model were the left middle temporal gyrus volume, the left hippocampus volume, the right entorhinal cortex volume, and the APOE (a gene that encodes apolipoprotein E) ɛ4 risk allele. Furthermore, we identified interactions between the right parahippocampal gyrus and the right lateral occipital gyrus, the right banks of the superior temporal sulcus and the left posterior cingulate, and SNP rs10838725 and the left lateral occipital gyrus. Our work shows the ability of NN models to not only classify and predict AD occurrence but also to identify important AD risk factors and interactions among them.


Assuntos
Doença de Alzheimer/classificação , Doença de Alzheimer/diagnóstico por imagem , Encéfalo/diagnóstico por imagem , Encéfalo/patologia , Bases de Dados Genéticas , Rede Nervosa/diagnóstico por imagem , Rede Nervosa/patologia , Idoso , Idoso de 80 Anos ou mais , Doença de Alzheimer/genética , Apolipoproteínas E/genética , Disfunção Cognitiva , Estudos de Coortes , Progressão da Doença , Feminino , Humanos , Imagem por Ressonância Magnética , Masculino , Neuroimagem , Tamanho do Órgão , Curva ROC , Risco
20.
Front Microbiol ; 9: 872, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29774017

RESUMO

Comparing metagenomic samples is crucial for understanding microbial communities. For different groups of microbial communities, such as human gut metagenomic samples from patients with a certain disease and healthy controls, identifying group-specific sequences offers essential information for potential biomarker discovery. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered "group-specific" in our study. Our main purpose is to discover group-specific sequence regions between control and case groups as disease-associated markers. We developed a long k-mer (k ≥ 30 bps)-based computational pipeline to detect group-specific sequences at strain resolution free from reference sequences, sequence alignments, and metagenome-wide de novo assembly. We called our method MetaGO: Group-specific oligonucleotide analysis for metagenomic samples. An open-source pipeline on Apache Spark was developed with parallel computing. We applied MetaGO to one simulated and three real metagenomic datasets to evaluate the discriminative capability of identified group-specific markers. In the simulated dataset, 99.11% of group-specific logical 40-mers covered 98.89% disease-specific regions from the disease-associated strain. In addition, 97.90% of group-specific numerical 40-mers covered 99.61 and 96.39% of differentially abundant genome and regions between two groups, respectively. For a large-scale metagenomic liver cirrhosis (LC)-associated dataset, we identified 37,647 group-specific 40-mer features. Any one of the features can predict disease status of the training samples with the average of sensitivity and specificity higher than 0.8. The random forests classification using the top 10 group-specific features yielded a higher AUC (from ∼0.8 to ∼0.9) than that of previous studies. All group-specific 40-mers were present in LC patients, but not healthy controls. All the assembled 11 LC-specific sequences can be mapped to two strains of Veillonella parvula: UTDB1-3 and DSM2008. The experiments on the other two real datasets related to Inflammatory Bowel Disease and Type 2 Diabetes in Women consistently demonstrated that MetaGO achieved better prediction accuracy with fewer features compared to previous studies. The experiments showed that MetaGO is a powerful tool for identifying group-specific k-mers, which would be clinically applicable for disease prediction. MetaGO is available at https://github.com/VVsmileyx/MetaGO.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA