RESUMO
BACKGROUND: The virome obtained through virus-like particle enrichment contains a mixture of prokaryotic and eukaryotic virus-derived fragments. Accurate identification and classification of these elements are crucial to understanding their roles and functions in microbial communities. However, the rapid mutation rates of viral genomes pose challenges in developing high-performance tools for classification, potentially limiting downstream analyses. FINDINGS: We present IPEV, a novel method to distinguish prokaryotic and eukaryotic viruses in viromes, with a 2-dimensional convolutional neural network combining trinucleotide pair relative distance and frequency. Cross-validation assessments of IPEV demonstrate its state-of-the-art precision, significantly improving the F1-score by approximately 22% on an independent test set compared to existing methods when query viruses share less than 30% sequence similarity with known viruses. Furthermore, IPEV outperforms other methods in accuracy on marine and gut virome samples based on annotations by sequence alignments. IPEV reduces runtime by at most 1,225 times compared to existing methods under the same computing configuration. We also utilized IPEV to analyze longitudinal samples and found that the gut virome exhibits a higher degree of temporal stability than previously observed in persistent personal viromes, providing novel insights into the resilience of the gut virome in individuals. CONCLUSIONS: IPEV is a high-performance, user-friendly tool that assists biologists in identifying and classifying prokaryotic and eukaryotic viruses within viromes. The tool is available at https://github.com/basehc/IPEV.
Assuntos
Aprendizado Profundo , Viroma , Vírus , Viroma/genética , Vírus/genética , Vírus/classificação , Células Procarióticas/virologia , Genoma Viral , Eucariotos/genética , Eucariotos/virologia , Biologia Computacional/métodos , Software , HumanosRESUMO
Heterogeneity in host and gut microbiota hampers microbial precision intervention of type 2 diabetes mellitus (T2DM). Here, we investigated novel features for patient stratification and bacterial modulators for intervention, using cross-sectional patient cohorts and animal experiments. We collected stool, blood, and urine samples from 103 patients with recent-onset T2DM and 25 healthy control subjects (HCs), performed gut microbial composition and metabolite profiling, and combined it with host transcriptome, metabolome, cytokine, and clinical data. Stool type (dry or loose stool), a feature of the stool microenvironment recently explored in microbiome studies, was used for stratification of patients with T2DM as it explained most of the variation in the multiomics data set among all clinical parameters in our covariate analysis. T2DM with dry stool (DM-DS) and loose stool (DM-LS) were clearly differentiated from HC and each other by LightGBM models, optimal among multiple machine learning models. Compared with DM-DS, DM-LS exhibited discordant gut microbial taxonomic and functional profiles, severe host metabolic disorder, and excessive insulin secretion. Further cross-measurement association analysis linked the differential microbial profiles, in particular Blautia abundances, to T2DM phenotypes in our stratified multiomics data set. Notably, oral supplementation of Blautia to T2DM mice induced inhibitory effects on lipid accumulation, weight gain, and blood glucose elevation with simultaneous modulation of gut bacterial composition, revealing the therapeutic potential of Blautia. Our study highlights the clinical implications of stool microenvironment stratification and Blautia supplementation in T2DM, offering promising prospects for microbial precision treatment of metabolic diseases.
Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Camundongos , Animais , Diabetes Mellitus Tipo 2/metabolismo , Estudos Transversais , Multiômica , Fezes/microbiologia , Bactérias/genéticaRESUMO
BACKGROUND: As a non-invasive and effective diagnostic method for small intestinal bacterial overgrowth (SIBO), wild-use of breath test (BT) has demonstrated a high comorbidity rate in patients with diarrhea-predominant irritable bowel syndrome (IBS-D) and SIBO. Patients overlapping with SIBO respond better to rifaximin therapy than those with IBS-D only. Gut microbiota plays a critical role in both of these two diseases. We aimed to determine the microbial difference between IBS-D overlapping with/without SIBO, and to study the underlying mechanism of its sensitivity to rifaximin. METHODS: Patients with IBS-D were categorized as BT-negative (IBSN) and BT-positive (IBSP). Healthy volunteers (BT-negative) were enrolled as healthy control. The patients were clinically evaluated before and after rifaximin treatment (0.4 g bid, 4 weeks). Blood, intestine, and stool samples were collected for cytokine assessment and gut microbial analyses. RESULTS: Clinical complaints and microbial abundance were significantly higher in IBSP than in IBSN. In contrast, severe systemic inflammation and more active bacterial invasion function that were associated with enrichment of opportunistic pathogens were seen in IBSN. The symptoms of IBSP patients were relieved in different degrees after therapy, but the symptoms of IBSN rarely changed. We also found that the presence of IBSN-enriched genera ( Enterobacter and Enterococcus ) are unaffected by rifaximin therapy. CONCLUSIONS: IBS-D patients overlapping with SIBO showed noticeably different fecal microbial composition and function compared with IBS-D only. The better response to rifaximin in those comorbid patients might associate with their different gut microbiota, which suggests that BT is necessary before IBS-D diagnosis and use of rifaximin. REGISTRATION: Chinese Clinical Trial Registry, ChiCTR1800017911.
Assuntos
Síndrome do Intestino Irritável , Testes Respiratórios/métodos , Citocinas , Humanos , Intestino Delgado , Síndrome do Intestino Irritável/diagnóstico , Síndrome do Intestino Irritável/tratamento farmacológico , Rifaximina/uso terapêuticoRESUMO
[This corrects the article DOI: 10.3389/fmicb.2022.782210.].
RESUMO
Airborne microbiome alterations, an emerging global health concern, have been linked to anthropogenic activities in numerous studies. However, these studies have not reached a consensus. To reveal general trends, we conducted a meta-analysis using 3226 air samples from 42 studies, including 29 samples of our own. We found that samples in anthropogenic activity-related categories showed increased microbial diversity, increased relative abundance of pathogens, increased co-occurrence network complexity, and decreased positive edge proportions in the network compared with the natural environment category. Most of the above conclusions were confirmed using the samples we collected in a particular period with restricted anthropogenic activities. Additionally, unlike most previous studies, we used 15 human-production process factors to quantitatively describe anthropogenic activities. We found that microbial richness was positively correlated with fine particulate matter concentration, NH3 emissions, and agricultural land proportion and negatively correlated with the gross domestic product per capita. Airborne pathogens showed preferences for different factors, indicating potential health implications. SourceTracker analysis showed that the human body surface was a more likely source of airborne pathogens than other environments. Our results advance the understanding of relationships between anthropogenic activities and airborne bacteria and highlight the role of airborne pathogens in public health.
Assuntos
Poluentes Atmosféricos , Microbiota , Microbiologia do Ar , Poluentes Atmosféricos/análise , Efeitos Antropogênicos , Bactérias , Monitoramento Ambiental , Humanos , Material Particulado/análiseRESUMO
Viruses are increasingly viewed as vital components of the human gut microbiota, while their roles in health and diseases remain incompletely understood. Here, we first sequenced and analyzed the 37 metagenomic and 18 host metabolomic samples related to irritable bowel syndrome (IBS) and found that some shifted viruses between IBS and controls covaried with shifted bacteria and metabolites. Especially, phages that infect beneficial lactic acid bacteria depleted in IBS covaried with their hosts. We also retrieved public whole-genome metagenomic datasets of another four diseases (type 2 diabetes, Crohn's disease, colorectal cancer, and liver cirrhosis), totaling 438 samples including IBS, and performed uniform analysis of the gut viruses in diseases. By constructing disease-specific co-occurrence networks, we found viruses actively interacting with bacteria, negatively correlated with possible dysbiosis-related and inflammation-mediating bacteria, increasing the connectivity between bacteria modules, and contributing to the robustness of the networks. Functional enrichment analysis showed that phages interact with bacteria through predation or expressing genes involved in the transporter and secretion system, metabolic enzymes, etc. We further built a viral database to facilitate systematic functional classification and explored the functions of viral genes on interacting with bacteria. Our analyses provided a systematic view of the gut virome in the disease-related microbial community and suggested possible positive roles of viruses concerning gut health.
Assuntos
Bacteriófagos , Diabetes Mellitus Tipo 2 , Microbioma Gastrointestinal , Síndrome do Intestino Irritável , Microbiota , Vírus , Bactérias/genética , Bacteriófagos/genética , Microbioma Gastrointestinal/genética , Humanos , Viroma/genética , Vírus/genéticaRESUMO
Background: Carbapenem-resistant Acinetobacter baumannii (CRAB) is a common cause of ventilator-associated pneumonia (VAP) in intensive care unit (ICU) patients, but its infection and colonization state are difficult to distinguish. If the judgment is wrong, it may aggravate the abuse of antibiotics and further accelerate the evolution of drug resistance. We sought to provide new clues for the diagnosis, pathogenesis and treatment of CRAB VAP based on lower respiratory tract (LRT) microbiota. Methods: A prospective study was conducted on patients with mechanical ventilation from July 2018 to December 2019 in a tertiary hospital. Multi-genomics studies (16S rRNA amplicon, metagenomics, and whole-genome sequencing [WGS]) of endotracheal deep aspirate (ETA) were performed. Results: Fifty-two ICU patients were enrolled, including 24 with CRAB VAP (CRAB-I), 22 with CRAB colonization (CRAB-C), and six CRAB-negative patients (infection-free) (CRAB-N). Diversity of pulmonary microbiota was significantly lower in CRAB-I than in CRAB-C or CRAB-N (mean Shannon index, 1.79 vs. 2.73 vs. 4.81, P < 0.05). Abundances of 11 key genera differed between the groups. Acinetobacter was most abundant in CRAB-I (76.19%), moderately abundant in CRAB-C (59.14%), and least abundant in CRAB-N (11.25%), but its interactions with other genera increased in turn. Metagenomics and WGS analysis showed that virulence genes were more abundant in CRAB-I than in CRAB-C. Multi-locus sequence typing (MLST) of 46 CRAB isolates revealed that the main types were ST208 (30.43%) and ST938 (15.22%), with no difference between CRAB-I and CRAB-C. Conclusion: Lower respiratory tract microbiota dysbiosis including elevated relative abundance of Acinetobacter and reduced bacterial interactions, and virulence enrichment may lead to CRAB VAP.
RESUMO
As a life-threatening disease, stroke is the leading cause of death and also induces adult disability worldwide. To investigate the efficacy of the integrated traditional Chinese medicine (ITCM) on the therapeutic effects of acute ischemic stroke (AIS) patients, we enrolled 26 patients in the ITCM [Tanhuo decoction (THD) + Western medicine (WM)] group and 23 in the WM group. Thirty healthy people were also included in the healthy control (HC) group. ITCM achieved better functional outcomes than WM, including significant reduction of the phlegm-heat syndrome and neurological impairment, and improvement of ability. These facts were observed in different pretreatment gut enterotypes. In this paper, we collected the stool samples of all participants and analyzed the 16S rRNA sequence data of the gut microbiota. We identified two enterotypes (Type-A and Type-B) of the gut microbial community in AIS samples before treatment. Compared to Type-B, Type-A was characterized by a high proportion of Bacteroides, relatively high diversity, and severe functional damage. In the ITCM treatment group, we observed better clinical efficacy and positive alterations in microbial diversity and beneficial bacterial abundance, and the effect of approaching healthy people's gut microbiota, regardless of gut enterotypes identified in pretreatment. Furthermore, we detected several gut microbiota as potential therapeutic targets of ITCM treatment by analyzing the correlations between bacterial abundance alterations and functional outcomes, where Dorea with the strongest correlation was known to produce anti-inflammatory metabolite and negatively linked to trimethylamine-N-oxide (TMAO), a biomarker of AIS. This study analyzed clinical and gut microbial data and revealed the possibility of a broad application independent of the enterotypes, as well as the therapeutic targets of the ITCM in treating AIS patients with phlegm-heat syndrome.
Assuntos
Microbioma Gastrointestinal , AVC Isquêmico , Microbiota , Adulto , Humanos , AVC Isquêmico/tratamento farmacológico , Medicina Tradicional Chinesa , RNA Ribossômico 16S/genéticaRESUMO
SUMMARY: We present HoPhage (Host of Phage) to identify the host of a given phage fragment from metavirome data at the genus level. HoPhage integrates two modules using a deep learning algorithm and a Markov chain model, respectively. HoPhage achieves 47.90% and 82.47% mean accuracy at the genus and phylum levels for â¼1-kb long artificial phage fragments when predicting host among 50 genera, representing 7.54-20.22% and 13.55-24.31% improvement, respectively. By testing on three real virome samples, HoPhage yields 81.11% mean accuracy at the genus level within a much broader candidate host range. AVAILABILITY AND IMPLEMENTATION: HoPhage is available at http://cqb.pku.edu.cn/ZhuLab/HoPhage/data/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Bacteriófagos , Algoritmos , Cadeias de Markov , SoftwareRESUMO
The SARS-CoV-2 pandemic has raised concerns in the identification of the hosts of the virus since the early stages of the outbreak. To address this problem, we proposed a deep learning method, DeepHoF, based on extracting viral genomic features automatically, to predict the host likelihood scores on five host types, including plant, germ, invertebrate, non-human vertebrate and human, for novel viruses. DeepHoF made up for the lack of an accurate tool, reaching a satisfactory AUC of 0.975 in the five-classification, and could make a reliable prediction for the novel viruses without close neighbors in phylogeny. Additionally, to fill the gap in the efficient inference of host species for SARS-CoV-2 using existing tools, we conducted a deep analysis on the host likelihood profile calculated by DeepHoF. Using the isolates sequenced in the earliest stage of the COVID-19 pandemic, we inferred that minks, bats, dogs and cats were potential hosts of SARS-CoV-2, while minks might be one of the most noteworthy hosts. Several genes of SARS-CoV-2 demonstrated their significance in determining the host range. Furthermore, a large-scale genome analysis, based on DeepHoF's computation for the later pandemic in 2020, disclosed the uniformity of host range among SARS-CoV-2 samples and the strong association of SARS-CoV-2 between humans and minks.
Assuntos
COVID-19/virologia , Gatos/virologia , Quirópteros/virologia , Cães/virologia , Vison/virologia , SARS-CoV-2/classificação , Algoritmos , Animais , COVID-19/transmissão , Aprendizado Profundo , Especificidade de Hospedeiro , Humanos , RNA Viral/genética , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , Análise de Sequência de RNARESUMO
BACKGROUND: Prokaryotic viruses referred to as phages can be divided into virulent and temperate phages. Distinguishing virulent and temperate phage-derived sequences in metavirome data is important for elucidating their different roles in interactions with bacterial hosts and regulation of microbial communities. However, there is no experimental or computational approach to effectively classify their sequences in culture-independent metavirome. We present a new computational method, DeePhage, which can directly and rapidly judge each read or contig as a virulent or temperate phage-derived fragment. FINDINGS: DeePhage uses a "one-hot" encoding form to represent DNA sequences in detail. Sequence signatures are detected via a convolutional neural network to obtain valuable local features. The accuracy of DeePhage on 5-fold cross-validation reaches as high as 89%, nearly 10% and 30% higher than that of 2 similar tools, PhagePred and PHACTS. On real metavirome, DeePhage correctly predicts the highest proportion of contigs when using BLAST as annotation, without apparent preferences. Besides, DeePhage reduces running time vs PhagePred and PHACTS by 245 and 810 times, respectively, under the same computational configuration. By direct detection of the temperate viral fragments from metagenome and metavirome, we furthermore propose a new strategy to explore phage transformations in the microbial community. The ability to detect such transformations provides us a new insight into the potential treatment for human disease. CONCLUSIONS: DeePhage is a novel tool developed to rapidly and efficiently identify 2 kinds of phage fragments especially for metagenomics analysis. DeePhage is freely available via http://cqb.pku.edu.cn/ZhuLab/DeePhage or https://github.com/shufangwu/DeePhage.
Assuntos
Bacteriófagos , Aprendizado Profundo , Microbiota , Bacteriófagos/genética , Humanos , Metagenoma , Metagenômica/métodosRESUMO
Enterobacter cloacae complex (ECC) is composed of multiple species and the taxonomic status is consecutively updated. In last decades ECC is frequently associated with multidrug resistance and become an important nosocomial pathogen. Currently, rapid and accurate identification of ECC to the species level remains a technical challenge, thus impedes our understanding of the population at the species level. Here, we aimed to develop a simple, reliable, and economical method to distinguish four epidemiologically prevalent species of ECC with clinical significance, i.e., E. cloacae, E. hormaechei, E. roggenkampii, and E. kobei. A total of 977 ECC genomes were retrieved from the GenBank, and unique gene for each species was obtained by core-genome comparisons. Four pairs of species-specific primers were designed based on the unique genes. A total of 231 ECC clinical strains were typed both by hsp60 typing and by species-specific PCRs. The specificity and sensitivity of the four species-specific PCRs ranged between 96.56% and 100% and between 76.47% and 100%, respectively. The PCR for E. cloacae showed the highest specificity and sensitivity. A one-step multiplex PCR was subsequently established by combining the species-specific primers. Additional 53 hsp60-typed ECC and 20 non-ECC isolates belonging to six species obtained from samples of patients, sewage water and feces of feeding animals were tested by the multiplex PCR. The identification results of both techniques were concordant. The multiplex PCR established in this study provides an accurate, expeditious, and cost-effective way for routine diagnosis and molecular surveillance of ECC strains at species level.
Assuntos
Enterobacter cloacae , Infecções por Enterobacteriaceae , Enterobacter , Enterobacter cloacae/genética , Infecções por Enterobacteriaceae/diagnóstico , Humanos , Reação em Cadeia da Polimerase MultiplexRESUMO
Acute ischemic stroke (AIS) is a major cause of acquired adult disability and death. Our previous studies proved the efficacy and effectiveness of Tanhuo decoction (THD) on AIS. However, the therapeutic mechanism remains unclear. We recruited 49 AIS patients and 30 healthy people to explore the effects of THD+basic treatment on the poststroke gut microbiota of AIS patients using 16S rRNA sequencing, in which 23 patients received basic treatment (control group) and 26 patients received THD+basic treatment (THD group). By comparing the data before and after treatments, we found the THD group acquired better outcome than the control group on both clinical outcome indices and the characteristics of gut microbiota. In addition to the mediation on short-chain fatty acid- (SCFA-) producing bacteria in two groups, treatment in the THD group significantly decreased the lipopolysaccharide- (LPS-) producing bacteria to reduce LPS biosynthesis. Besides, the complexity of the cooccurrence of gut microbiota and the competition among LPS-producing bacteria and opportunistic pathogenetic bacteria were enhanced in the THD group. Treatment in the THD group also exhibited the potential in decreasing genes on the biosynthesis of trimethylamine (TMA), the precursor of Trimethylamine N-oxide (TMAO), and increasing genes on the degradation of TMA, especially increasing trimethylamine-corrinoid protein Co-methyltransferase (mttB) which catabolizes TMA to methane. These results hinted that THD+basic treatment might exert its efficacy by mediating the gut microbiota and microbial metabolites, including LPS and TMAO that aggravate the sterile inflammation and platelet aggregation. Moreover, the well-fitting regression model results in predicting the clinical outcome with the alteration of gut microbiota proved gut microbiota as a potential indicator of AIS and provided evidence of the communication between the gut and brain of AIS patients.
Assuntos
Medicamentos de Ervas Chinesas/farmacologia , Microbioma Gastrointestinal/efeitos dos fármacos , AVC Isquêmico/tratamento farmacológico , AVC Isquêmico/microbiologia , Doença Aguda , Estudos de Casos e Controles , Humanos , Estudos Prospectivos , Resultado do TratamentoRESUMO
BACKGROUND: The diagnosis of inflammatory bowel disease (IBD) and discrimination between the types of IBD are clinically important. IBD is associated with marked changes in the intestinal microbiota. Advances in next-generation sequencing (NGS) technology and the improved hospital bioinformatics analysis ability motivated us to develop a diagnostic method based on the gut microbiome. RESULTS: Using a set of whole-genome sequencing (WGS) data from 349 human gut microbiota samples with two types of IBD and healthy controls, we assembled and aligned WGS short reads to obtain feature profiles of strains and genera. The genus and strain profiles were used for the 16S-based and WGS-based diagnostic modules construction respectively. We designed a novel feature selection procedure to select those case-specific features. With these features, we built discrimination models using different machine learning algorithms. The machine learning algorithm LightGBM outperformed other algorithms in this study and thus was chosen as the core algorithm. Specially, we identified two small sets of biomarkers (strains) separately for the WGS-based health vs IBD module and ulcerative colitis vs Crohn's disease module, which contributed to the optimization of model performance during pre-training. We released LightCUD as an IBD diagnostic program built with LightGBM. The high performance has been validated through five-fold cross-validation and using an independent test data set. LightCUD was implemented in Python and packaged free for installation with customized databases. With WGS data or 16S rRNA sequencing data of gut microbiome samples as the input, LightCUD can discriminate IBD from healthy controls with high accuracy and further identify the specific type of IBD. The executable program LightCUD was released in open source with instructions at the webpage http://cqb.pku.edu.cn/ZhuLab/LightCUD/ . The identified strain biomarkers could be used to study the critical factors for disease development and recommend treatments regarding changes in the gut microbial community. CONCLUSIONS: As the first released human gut microbiome-based IBD diagnostic tool, LightCUD demonstrates a high-performance for both WGS and 16S sequencing data. The strains that either identify healthy controls from IBD patients or distinguish the specific type of IBD are expected to be clinically important to serve as biomarkers.
RESUMO
Background and Aims: Irritable bowel syndrome (IBS) and depression have high tendencies of comorbidity. In particular, diarrhea-predominant IBS (IBS-D) and depression exhibit similar fecal microbiota signatures, yet little is known about their pathogenic mechanism. Here, we propose that the differences in structure and composition of IBS-D and depression gut microbiota give rise to different downstream functions, which lead to distinct clinical phenotypes via host metabolism and further influence the interaction of brain-gut axis. Methods: We performed multiomics study, including fecal metagenome-wide sequencing and serum metabolomics profiling in 65 individuals with IBS-D (n=22), depression (n=15), comorbid patients (n=13), and healthy controls (n=15). We analyzed functional genes contributed by the primary genus and evaluated their correlations with clinical indices and host metabolites. Results: Metagenomic analysis revealed 26 clusters of orthologous groups of protein (COG) categories consisting of a total of 4,631 functional genes. Trehalose and maltose hydrolase (COG1554) and fucose permease (COG0738) were the most relevant and enriched functional genes in the IBS-D patients; urease accessory proteins UreE (COG2371) was that in the depression patients. Context based genome annotation suggest that an alteration of Escherichia coli and Enterobacter cloacae in IBS-D and depression respectively may be responsible for the enrichment described above. Correlation with host metabolites, such as maltotriose and isomaltose in carbohydrate metabolism and anandamide in neuroactive metabolism, drew further connections between these findings. Conclusions: These changes led us to propose a connection between genomic signatures and clinical differences observed in IBS-D and depression. Our findings provide further insights into the involvement of gut microbiota in diseases related to brain-gut disorder.
Assuntos
Microbioma Gastrointestinal , Síndrome do Intestino Irritável , Bactérias/genética , Depressão , Fezes , HumanosRESUMO
BACKGROUND: Variations in the human genome have been studied extensively. However, little is known about the role of micro-inversions (MIs), generally defined as small (< 100 bp) inversions, in human evolution, diversity, and health. Depicting the pattern of MIs among diverse populations is critical for interpreting human evolutionary history and obtaining insight into genetic diseases. RESULTS: In this paper, we explored the distribution of MIs in genomes from 26 human populations and 7 nonhuman primate genomes and analyzed the phylogenetic structure of the 26 human populations based on the MIs. We further investigated the functions of the MIs located within genes associated with human health. With hg19 as the reference genome, we detected 6968 MIs among the 1937 human samples and 24,476 MIs among the 7 nonhuman primate genomes. The analyses of MIs in human genomes showed that the MIs were rarely located in exonic regions. Nonhuman primates and human populations shared only 82 inverted alleles, and Africans had the most inverted alleles in common with nonhuman primates, which was consistent with the "Out of Africa" hypothesis. The clustering of MIs among the human populations also coincided with human migration history and ancestral lineages. CONCLUSIONS: We propose that MIs are potential evolutionary markers for investigating population dynamics. Our results revealed the diversity of MIs in human populations and showed that they are essential to construct human population relationships and have a potential effect on human health.
Assuntos
Evolução Molecular , Genética Populacional , Animais , Variação Genética , Humanos , Macaca mulatta , FilogeniaRESUMO
Nanopore sequencing is regarded as one of the most promising third-generation sequencing (TGS) technologies. Since 2014, Oxford Nanopore Technologies (ONT) has developed a series of devices based on nanopore sequencing to produce very long reads, with an expected impact on genomics. However, the nanopore sequencing reads are susceptible to a fairly high error rate owing to the difficulty in identifying the DNA bases from the complex electrical signals. Although several basecalling tools have been developed for nanopore sequencing over the past years, it is still challenging to correct the sequences after applying the basecalling procedure. In this study, we developed an open-source DNA basecalling reviser, NanoReviser, based on a deep learning algorithm to correct the basecalling errors introduced by current basecallers provided by default. In our module, we re-segmented the raw electrical signals based on the basecalled sequences provided by the default basecallers. By employing convolution neural networks (CNNs) and bidirectional long short-term memory (Bi-LSTM) networks, we took advantage of the information from the raw electrical signals and the basecalled sequences from the basecallers. Our results showed NanoReviser, as a post-basecalling reviser, significantly improving the basecalling quality. After being trained on standard ONT sequencing reads from public E. coli and human NA12878 datasets, NanoReviser reduced the sequencing error rate by over 5% for both the E. coli dataset and the human dataset. The performance of NanoReviser was found to be better than those of all current basecalling tools. Furthermore, we analyzed the modified bases of the E. coli dataset and added the methylation information to train our module. With the methylation annotation, NanoReviser reduced the error rate by 7% for the E. coli dataset and specifically reduced the error rate by over 10% for the regions of the sequence rich in methylated bases. To the best of our knowledge, NanoReviser is the first post-processing tool after basecalling to accurately correct the nanopore sequences without the time-consuming procedure of building the consensus sequence. The NanoReviser package is freely available at https://github.com/pkubioinformatics/NanoReviser.
RESUMO
SUMMARY: We present the first tool of gene prediction, PlasGUN, for plasmid metagenomic short-read data. The tool, developed based on deep learning algorithm of multiple input Convolutional Neural Network, demonstrates much better performance when tested on a benchmark dataset of artificial short reads and presents more reliable results for real plasmid metagenomic data than traditional gene prediction tools designed primarily for chromosome-derived short reads. AVAILABILITY AND IMPLEMENTATION: The PlasGUN software is available at http://cqb.pku.edu.cn/ZhuLab/PlasGUN/ or https://github.com/zhenchengfang/PlasGUN/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Aprendizado Profundo , Software , Algoritmos , Metagenoma , Redes Neurais de Computação , PlasmídeosRESUMO
In the past ten years, the research and application of microbiome has continued to increase. The microbiome has gradually become the research focus in the fields of life science, environmental science, and medicine. Meanwhile, many countries and organizations around the world are launching their own microbiome projects and conducting a multi-faceted layout, striving to gain a strategic position in this promising field. In addition, whether it is scientific research or industrial applications, there has been a climax of research and a wave of investment and financing, accordingly, products and services related to the microbiome are constantly emerging. However, due to the rapid development of microbiome sequencing and analysis related technologies and methods, the research and application from various countries have not yet unified on the standards of technology, programs, and data. Domestic industry participants also have insufficient understanding of the microbiome. New methods, technologies, and theories have not yet been fully accepted and used. In addition, some of the existing standards and guidelines are too general with poor practicality. This not only causes obstacles in the integration of scientific research data and waste of resources, but also gives related companies unfair competition opportunity. More importantly, China still lacks national standards related to the microbiome, and the national microbiome project is still in the process of preparation. In this context, the experts and practitioners of the microbiome worked together and developed the consensus of experts. It can not only guide domestic scientific research and industrial institutions to regulate the production, learning and research of the microbiome, the application can also provide reference technical basis for the relevant national functional departments, protect the scale and standardized corporate company's interests, strengthen industry self-discipline, avoid unregulated enterprises from disrupting the market, and ultimately promote the benign development of microbiome-related industries.