Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
BMC Bioinformatics ; 25(1): 214, 2024 Jun 14.
Artículo en Inglés | MEDLINE | ID: mdl-38877401

RESUMEN

BACKGROUND: The exploration of gene-disease associations is crucial for understanding the mechanisms underlying disease onset and progression, with significant implications for prevention and treatment strategies. Advances in high-throughput biotechnology have generated a wealth of data linking diseases to specific genes. While graph representation learning has recently introduced groundbreaking approaches for predicting novel associations, existing studies always overlooked the cumulative impact of functional modules such as protein complexes and the incompletion of some important data such as protein interactions, which limits the detection performance. RESULTS: Addressing these limitations, here we introduce a deep learning framework called ModulePred for predicting disease-gene associations. ModulePred performs graph augmentation on the protein interaction network using L3 link prediction algorithms. It builds a heterogeneous module network by integrating disease-gene associations, protein complexes and augmented protein interactions, and develops a novel graph embedding for the heterogeneous module network. Subsequently, a graph neural network is constructed to learn node representations by collectively aggregating information from topological structure, and gene prioritization is carried out by the disease and gene embeddings obtained from the graph neural network. Experimental results underscore the superiority of ModulePred, showcasing the effectiveness of incorporating functional modules and graph augmentation in predicting disease-gene associations. This research introduces innovative ideas and directions, enhancing the understanding and prediction of gene-disease relationships.


Asunto(s)
Algoritmos , Aprendizaje Profundo , Humanos , Biología Computacional/métodos , Mapas de Interacción de Proteínas/genética , Predisposición Genética a la Enfermedad/genética , Redes Neurales de la Computación , Estudios de Asociación Genética/métodos
2.
BMC Genomics ; 25(1): 515, 2024 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-38796435

RESUMEN

BACKGROUND: The short-read whole-genome sequencing (WGS) approach has been widely applied to investigate the genomic variation in the natural populations of many plant species. With the rapid advancements in long-read sequencing and genome assembly technologies, high-quality genome sequences are available for a group of varieties for many plant species. These genome sequences are expected to help researchers comprehensively investigate any type of genomic variants that are missed by the WGS technology. However, multiple genome alignment (MGA) tools designed by the human genome research community might be unsuitable for plant genomes. RESULTS: To fill this gap, we developed the AnchorWave-Cactus Multiple Genome Alignment (ACMGA) pipeline, which improved the alignment of repeat elements and could identify long (> 50 bp) deletions or insertions (INDELs). We conducted MGA using ACMGA and Cactus for 8 Arabidopsis (Arabidopsis thaliana) and 26 Maize (Zea mays) de novo assembled genome sequences and compared them with the previously published short-read variant calling results. MGA identified more single nucleotide variants (SNVs) and long INDELs than did previously published WGS variant callings. Additionally, ACMGA detected significantly more SNVs and long INDELs in repetitive regions and the whole genome than did Cactus. Compared with the results of Cactus, the results of ACMGA were more similar to the previously published variants called using short-read. These two MGA pipelines identified numerous multi-allelic variants that were missed by the WGS variant calling pipeline. CONCLUSIONS: Aligning de novo assembled genome sequences could identify more SNVs and INDELs than mapping short-read. ACMGA combines the advantages of AnchorWave and Cactus and offers a practical solution for plant MGA by integrating global alignment, a 2-piece-affine-gap cost strategy, and the progressive MGA algorithm.


Asunto(s)
Arabidopsis , Genoma de Planta , Zea mays , Arabidopsis/genética , Zea mays/genética , Alineación de Secuencia , Mutación INDEL , Genómica/métodos , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma/métodos , Programas Informáticos
3.
Bioinformatics ; 39(4)2023 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-36946295

RESUMEN

MOTIVATION: Beta-diversity quantitatively measures the difference among microbial communities thus enlightening the association between microbiome composition and environment properties or host phenotypes. The beta-diversity analysis mainly relies on distances among microbiomes that are calculated by all microbial features. However, in some cases, only a small fraction of members in a community plays crucial roles. Such a tiny proportion is insufficient to alter the overall distance, which is always missed by end-to-end comparison. On the other hand, beta-diversity pattern can also be interfered due to the data sparsity when only focusing on nonabundant microbes. RESULTS: Here, we develop Flex Meta-Storms (FMS) distance algorithm that implements the "local alignment" of microbiomes for the first time. Using a flexible extraction that considers the weighted phylogenetic and functional relations of microbes, FMS produces a normalized phylogenetic distance among members of interest for microbiome pairs. We demonstrated the advantage of FMS in detecting the subtle variations of microbiomes among different states using artificial and real datasets, which were neglected by regular distance metrics. Therefore, FMS effectively discriminates microbiomes with higher sensitivity and flexibility, thus contributing to in-depth comprehension of microbe-host interactions, as well as promoting the utilization of microbiome data such as disease screening and prediction. AVAILABILITY AND IMPLEMENTATION: FMS is implemented in C++, and the source code is released at https://github.com/qdu-bioinfo/flex-meta-storms.


Asunto(s)
Microbiota , Filogenia , Programas Informáticos , Algoritmos
4.
BMC Genomics ; 22(1): 9, 2021 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-33407112

RESUMEN

BACKGROUND: Due to their much lower costs in experiment and computation than metagenomic whole-genome sequencing (WGS), 16S rRNA gene amplicons have been widely used for predicting the functional profiles of microbiome, via software tools such as PICRUSt 2. However, due to the potential PCR bias and gene profile variation among phylogenetically related genomes, functional profiles predicted from 16S amplicons may deviate from WGS-derived ones, resulting in misleading results. RESULTS: Here we present Meta-Apo, which greatly reduces or even eliminates such deviation, thus deduces much more consistent diversity patterns between the two approaches. Tests of Meta-Apo on > 5000 16S-rRNA amplicon human microbiome samples from 4 body sites showed the deviation between the two strategies is significantly reduced by using only 15 WGS-amplicon training sample pairs. Moreover, Meta-Apo enables cross-platform functional comparison between WGS and amplicon samples, thus greatly improve 16S-based microbiome diagnosis, e.g. accuracy of gingivitis diagnosis via 16S-derived functional profiles was elevated from 65 to 95% by WGS-based classification. Therefore, with the low cost of 16S-amplicon sequencing, Meta-Apo can produce a reliable, high-resolution view of microbiome function equivalent to that offered by shotgun WGS. CONCLUSIONS: This suggests that large-scale, function-oriented microbiome sequencing projects can probably benefit from the lower cost of 16S-amplicon strategy, without sacrificing the precision in functional reconstruction that otherwise requires WGS. An optimized C++ implementation of Meta-Apo is available on GitHub ( https://github.com/qibebt-bioinfo/meta-apo ) under a GNU GPL license. It takes the functional profiles of a few paired WGS:16S-amplicon samples as training, and outputs the calibrated functional profiles for the much larger number of 16S-amplicon samples.


Asunto(s)
Bacterias , Microbiota , Bacterias/genética , Metagenoma , Metagenómica , Microbiota/genética , ARN Ribosómico 16S/genética
5.
Bioinformatics ; 36(7): 2308-2310, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-31793979

RESUMEN

MOTIVATION: An accurate and reliable distance (or dissimilarity) among shotgun metagenomes is fundamental to deducing the beta-diversity of microbiomes. To compute the distance at the species level, current methods either ignore the evolutionary relationship among species or fail to account for unclassified organisms that cannot be mapped to definite tip nodes in the phylogenic tree, thus can produce erroneous beta-diversity pattern. RESULTS: To solve these problems, we propose the Dynamic Meta-Storms (DMS) algorithm to enable the comprehensive comparison of metagenomes on the species level with both taxonomy and phylogeny profiles. It compares the identified species of metagenomes with phylogeny, and then dynamically places the unclassified species to the virtual nodes of the phylogeny tree via their higher-level taxonomy information. Its high speed and low memory consumption enable pairwise comparison of 100 000 metagenomes (synthesized from 3688 bacteria) within 6.4 h on a single computing node. AVAILABILITY AND IMPLEMENTATION: An optimized implementation of DMS is available on GitHub (https://github.com/qibebt-bioinfo/dynamic-meta-storms) under a GNU GPL license. It takes the species-level profiles of metagenomes as input, and generates their pairwise distance matrix. The bacterial species-level phylogeny tree and taxonomy information of MetaPhlAn2 have been integrated into this implementation, while customized tree and taxonomy are also supported. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Metagenoma , Microbiota , Algoritmos , Evolución Biológica , Filogenia
6.
BMC Genomics ; 19(1): 144, 2018 02 14.
Artículo en Inglés | MEDLINE | ID: mdl-29444661

RESUMEN

BACKGROUND: RNA-Seq has become one of the most widely used applications based on next-generation sequencing technology. However, raw RNA-Seq data may have quality issues, which can significantly distort analytical results and lead to erroneous conclusions. Therefore, the raw data must be subjected to vigorous quality control (QC) procedures before downstream analysis. Currently, an accurate and complete QC of RNA-Seq data requires of a suite of different QC tools used consecutively, which is inefficient in terms of usability, running time, file usage, and interpretability of the results. RESULTS: We developed a comprehensive, fast and easy-to-use QC pipeline for RNA-Seq data, RNA-QC-Chain, which involves three steps: (1) sequencing-quality assessment and trimming; (2) internal (ribosomal RNAs) and external (reads from foreign species) contamination filtering; (3) alignment statistics reporting (such as read number, alignment coverage, sequencing depth and pair-end read mapping information). This package was developed based on our previously reported tool for general QC of next-generation sequencing (NGS) data called QC-Chain, with extensions specifically designed for RNA-Seq data. It has several features that are not available yet in other QC tools for RNA-Seq data, such as RNA sequence trimming, automatic rRNA detection and automatic contaminating species identification. The three QC steps can run either sequentially or independently, enabling RNA-QC-Chain as a comprehensive package with high flexibility and usability. Moreover, parallel computing and optimizations are embedded in most of the QC procedures, providing a superior efficiency. The performance of RNA-QC-Chain has been evaluated with different types of datasets, including an in-house sequencing data, a semi-simulated data, and two real datasets downloaded from public database. Comparisons of RNA-QC-Chain with other QC tools have manifested its superiorities in both function versatility and processing speed. CONCLUSIONS: We present here a tool, RNA-QC-Chain, which can be used to comprehensively resolve the quality control processes of RNA-Seq data effectively and efficiently.


Asunto(s)
Biología Computacional/métodos , ARN/genética , Programas Informáticos , Animales , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos , Internet , Control de Calidad , ARN/química , Reproducibilidad de los Resultados
7.
Bioinformatics ; 32(10): 1486-92, 2016 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-26787661

RESUMEN

MOTIVATION: Single nucleotide variant (SNV) detection procedures are being utilized as never before to analyze the recent abundance of high-throughput DNA sequencing data, both on single and multiple sample datasets. Building on previously published work with the single sample SNV caller genotype model selection (GeMS), a multiple sample version of GeMS (MultiGeMS) is introduced. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic substitution sequencing errors. It also addresses the multiple testing problem endemic to multiple sample SNV calling and utilizes high performance computing (HPC) techniques. RESULTS: A simulation study demonstrates that MultiGeMS ranks highest in precision among a selection of popular multiple sample SNV callers, while showing exceptional recall in calling common SNVs. Further, both simulation studies and real data analyses indicate that MultiGeMS is robust to low-quality data. We also demonstrate that accounting for enzymatic substitution sequencing errors not only improves SNV call precision at low mapping quality regions, but also improves recall at reference allele-dominated sites with high mapping quality. AVAILABILITY AND IMPLEMENTATION: The MultiGeMS package can be downloaded from https://github.com/cui-lab/multigems CONTACT: xinping.cui@ucr.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Genotipo , Polimorfismo de Nucleótido Simple
8.
PLoS Genet ; 10(1): e1004094, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24415958

RESUMEN

Oleaginous microalgae are promising feedstock for biofuels, yet the genetic diversity, origin and evolution of oleaginous traits remain largely unknown. Here we present a detailed phylogenomic analysis of five oleaginous Nannochloropsis species (a total of six strains) and one time-series transcriptome dataset for triacylglycerol (TAG) synthesis on one representative strain. Despite small genome sizes, high coding potential and relative paucity of mobile elements, the genomes feature small cores of ca. 2,700 protein-coding genes and a large pan-genome of >38,000 genes. The six genomes share key oleaginous traits, such as the enrichment of selected lipid biosynthesis genes and certain glycoside hydrolase genes that potentially shift carbon flux from chrysolaminaran to TAG synthesis. The eleven type II diacylglycerol acyltransferase genes (DGAT-2) in every strain, each expressed during TAG synthesis, likely originated from three ancient genomes, including the secondary endosymbiosis host and the engulfed green and red algae. Horizontal gene transfers were inferred in most lipid synthesis nodes with expanded gene doses and many glycoside hydrolase genes. Thus multiple genome pooling and horizontal genetic exchange, together with selective inheritance of lipid synthesis genes and species-specific gene loss, have led to the enormous genetic apparatus for oleaginousness and the wide genomic divergence among present-day Nannochloropsis. These findings have important implications in the screening and genetic engineering of microalgae for biofuels.


Asunto(s)
Genoma , Microalgas/genética , Filogenia , Triglicéridos/genética , Evolución Molecular , Transferencia de Gen Horizontal , Variación Genética , Anotación de Secuencia Molecular , Análisis de Secuencia de ADN , Especificidad de la Especie , Transcriptoma , Triglicéridos/biosíntesis
9.
Plant Physiol ; 169(4): 2444-61, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26486592

RESUMEN

The ability to rapidly switch the intracellular energy storage form from starch to lipids is an advantageous trait for microalgae feedstock. To probe this mechanism, we sequenced the 56.8-Mbp genome of Chlorella pyrenoidosa FACHB-9, an industrial production strain for protein, starch, and lipids. The genome exhibits positive selection and gene family expansion in lipid and carbohydrate metabolism and genes related to cell cycle and stress response. Moreover, 10 lipid metabolism genes might be originated from bacteria via horizontal gene transfer. Transcriptomic dynamics tracked via messenger RNA sequencing over six time points during metabolic switch from starch-rich heterotrophy to lipid-rich photoautotrophy revealed that under heterotrophy, genes most strongly expressed were from the tricarboxylic acid cycle, respiratory chain, oxidative phosphorylation, gluconeogenesis, glyoxylate cycle, and amino acid metabolisms, whereas those most down-regulated were from fatty acid and oxidative pentose phosphate metabolism. The shift from heterotrophy into photoautotrophy highlights up-regulation of genes from carbon fixation, photosynthesis, fatty acid biosynthesis, the oxidative pentose phosphate pathway, and starch catabolism, which resulted in a marked redirection of metabolism, where the primary carbon source of glycine is no longer supplied to cell building blocks by the tricarboxylic acid cycle and gluconeogenesis, whereas carbon skeletons from photosynthesis and starch degradation may be directly channeled into fatty acid and protein biosynthesis. By establishing the first genetic transformation in industrial oleaginous C. pyrenoidosa, we further showed that overexpression of an NAD(H) kinase from Arabidopsis (Arabidopsis thaliana) increased cellular lipid content by 110.4%, yet without reducing growth rate. These findings provide a foundation for exploiting the metabolic switch in microalgae for improved photosynthetic production of food and fuels.


Asunto(s)
Chlorella/metabolismo , Genómica , Metabolismo de los Lípidos , Almidón/metabolismo , Secuencia de Bases , Metabolismo de los Hidratos de Carbono , Carbono/metabolismo , Chlorella/genética , Ciclo del Ácido Cítrico , Transporte de Electrón , Ácidos Grasos/metabolismo , Procesos Heterotróficos , Datos de Secuencia Molecular , Fosforilación Oxidativa , Fotosíntesis , Análisis de Secuencia de ADN
10.
J Insect Sci ; 16(1)2016.
Artículo en Inglés | MEDLINE | ID: mdl-27638955

RESUMEN

Aiming at learning the association between the gut microbiota and termites with different diet habits and phylogenetic positions, the gut bacteria of three populations for each of the two higher termites (wood-feeding Mironasutitermes shangchengensis and fungus-feeding Odontotermes formosanus) and two wood-feeding lower termites (Tsaitermes ampliceps and Reticulitermes flaviceps) were analyzed by high-throughput 454 pyrosequencing of 16S V1-V3 amplicons. As results, 132 bacterial genera and some unidentified operational taxonomic units within 29 phyla in the gut bacteria were detected, with Spirochaetes (11-55%), Firmicutes (7-18%), Bacteroidetes (7-31%), and Proteobacteria (8-14%) as the main phyla, and Treponema, TG5, Dysgonomonas, Tannerella, za29, Lactococcus, Pseudomonas, and SJA-88 as the common genera in all the four termites. The diversity of gut bacterial communities in the higher termite guts was significantly greater than that in the lower termites; while the gut microbiota in M. shangchengensis (wood-feeding higher termite) was more similar to those of the wood-feeding lower termites rather than that of O. formosanus (fungus-feeding higher termite), and phylum Spirochaetes and nitrogen-fixing bacteria were super-dominant in the wood-feeding termites, despite of their phylogenetic relations. This study reported for the first time the gut bacterial communities for the termites of M. shangchengensis and T. ampliceps and the comparative analyses showed that the gut microbial communities varied according to the phylogeny and the diet habits of termites.


Asunto(s)
Bacterias/genética , Microbioma Gastrointestinal/fisiología , Isópteros/microbiología , Animales , Bacterias/citología , Microbioma Gastrointestinal/genética , Filogenia , Análisis de Secuencia de ADN
11.
BMC Bioinformatics ; 16 Suppl 18: S15, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26681607

RESUMEN

BACKGROUND: In recent years, high throughput and non-invasive Raman spectrometry technique has matured as an effective approach to identification of individual cells by species, even in complex, mixed populations. Raman profiling is an appealing optical microscopic method to achieve this. To fully utilize Raman proling for single-cell analysis, an extensive understanding of Raman spectra is necessary to answer questions such as which filtering methodologies are effective for pre-processing of Raman spectra, what strains can be distinguished by Raman spectra, and what features serve best as Raman-based biomarkers for single-cells, etc. RESULTS: In this work, we have proposed an approach called rDisc to discretize the original Raman spectrum into only a few (usually less than 20) representative peaks (Raman shifts). The approach has advantages in removing noises, and condensing the original spectrum. In particular, effective signal processing procedures were designed to eliminate noise, utilising wavelet transform denoising, baseline correction, and signal normalization. In the discretizing process, representative peaks were selected to signicantly decrease the Raman data size. More importantly, the selected peaks are chosen as suitable to serve as key biological markers to differentiate species and other cellular features. Additionally, the classication performance of discretized spectra was found to be comparable to full spectrum having more than 1000 Raman shifts. Overall, the discretized spectrum needs about 5storage space of a full spectrum and the processing speed is considerably faster. This makes rDisc clearly superior to other methods for single-cell classication.


Asunto(s)
Bacterias/química , Espectrometría Raman , Bases de Datos Factuales , Análisis Discriminante , Fenotipo , Análisis de Componente Principal , Procesamiento de Señales Asistido por Computador , Análisis de la Célula Individual
12.
Bioinformatics ; 30(7): 1031-3, 2014 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-24363375

RESUMEN

MOTIVATION: The number of microbial community samples is increasing with exponential speed. Data-mining among microbial community samples could facilitate the discovery of valuable biological information that is still hidden in the massive data. However, current methods for the comparison among microbial communities are limited by their ability to process large amount of samples each with complex community structure. SUMMARY: We have developed an optimized GPU-based software, GPU-Meta-Storms, to efficiently measure the quantitative phylogenetic similarity among massive amount of microbial community samples. Our results have shown that GPU-Meta-Storms would be able to compute the pair-wise similarity scores for 10 240 samples within 20 min, which gained a speed-up of >17 000 times compared with single-core CPU, and >2600 times compared with 16-core CPU. Therefore, the high-performance of GPU-Meta-Storms could facilitate in-depth data mining among massive microbial community samples, and make the real-time analysis and monitoring of temporal or conditional changes for microbial communities possible. AVAILABILITY AND IMPLEMENTATION: GPU-Meta-Storms is implemented by CUDA (Compute Unified Device Architecture) and C++. Source code is available at http://www.computationalbioenergy.org/meta-storms.html.


Asunto(s)
Bacterias/genética , Filogenia , Programas Informáticos , Algoritmos , Minería de Datos , Lenguajes de Programación
13.
Yi Chuan ; 37(7): 645-54, 2015 07.
Artículo en Zh | MEDLINE | ID: mdl-26351164

RESUMEN

Humans are born with microbiota, which have accompanied us through our life-span. There is an important symbiotic relationship between us and the microbial communities, thus microbial communities are of great importance to our health. All genomic information within this microbiota is referered to as "metagenomics" (also referred to as "human's second genome"). The analysis of high throughput metagenomic data generated from biomedical experiments would provide new approaches for translational research, and it have several applications in clinics. With the help of next generation sequencing technology and the emerging metagenomic approach (analysis of all genomic information in microbiota as a whole), we can overcome the pitfalls of tedious traditional method of isolation and cultivation of single microbial species. The metagenomic approach can also help us to analyze the whole microbial community efficiently and offer deep insights in human-microbe relationships as well as new ideas on many biomedical problems. In this review, we summarize frontiers in metagenomic research, including new concepts and methods. Then, we focus on the applications of metagenomic research in medical researches and clinical applications in recent years, which would clearly show the importance of metagenomic research in the field of translational medicine.


Asunto(s)
Metagenómica , Investigación Biomédica Traslacional , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos
14.
Microbiol Spectr ; 12(8): e0069524, 2024 Aug 06.
Artículo en Inglés | MEDLINE | ID: mdl-38912828

RESUMEN

Amplicon sequencing stands as a cornerstone in microbiome profiling, yet concerns persist regarding its resolution and accuracy. The enhancement of reference databases and annotations marks a new era for 16S rRNA-based profiling. Capitalizing on this potential, we introduce PM-profiler, a novel tool for profiling amplicon short reads. PM-profiler is implemented by C++-based advanced algorithms, such as pre-allocated hash for reference construction, hybrid and dynamic short-read matching, big-data-guided dual-mode hierarchical taxonomy annotation strategy, and full-procedure parallel computing. This tool delivers species-level resolution and ultrafast speed for large-scale microbiomes, surpassing alignment-based approaches and the Naïve-Bayesian model. Furthermore, recognizing the global uneven distribution of microbes, we delineate optimal annotation strategies for each sampling habitat based on microbial patterns over 270,000 microbiomes. Integrated with the established workflow of Parallel-Meta Suite and the latest curated reference databases, this endeavor offers a swift and dependable solution for high-precision microbiome surveys.IMPORTANCEOur study introduces PM-profiler, a new tool that deciphers the complexity of microbial communities. With advanced algorithms, flexible annotation strategies, and well-organized big-data, PM-profiler provides a faster and more accurate way to study on microbiomes, paving the way for discoveries that could improve our understanding of microbiomes and their impact on the world.


Asunto(s)
Algoritmos , Bacterias , Microbiota , ARN Ribosómico 16S , Programas Informáticos , Microbiota/genética , ARN Ribosómico 16S/genética , Bacterias/genética , Bacterias/clasificación , Bacterias/aislamiento & purificación , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Filogenia , Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Anotación de Secuencia Molecular
15.
Bioinform Adv ; 4(1): vbae013, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38371919

RESUMEN

Motivation: The human microbiome, found throughout various body parts, plays a crucial role in health dynamics and disease development. Recent research has highlighted microbiome disparities between patients with different diseases and healthy individuals, suggesting the microbiome's potential in recognizing health states. Traditionally, microbiome-based status classification relies on pre-trained machine learning (ML) models. However, most ML methods overlook microbial relationships, limiting model performance. Results: To address this gap, we propose PM-CNN (Phylogenetic Multi-path Convolutional Neural Network), a novel phylogeny-based neural network model for multi-status classification and disease detection using microbiome data. PM-CNN organizes microbes based on their phylogenetic relationships and extracts features using a multi-path convolutional neural network. An ensemble learning method then fuses these features to make accurate classification decisions. We applied PM-CNN to human microbiome data for status and disease detection, demonstrating its significant superiority over existing ML models. These results provide a robust foundation for microbiome-based state recognition and disease prediction in future research and applications. Availability and implementation: PM-CNN software is available at https://github.com/qdu-bioinfo/PM_CNN.

16.
Spectrochim Acta A Mol Biomol Spectrosc ; 318: 124454, 2024 Oct 05.
Artículo en Inglés | MEDLINE | ID: mdl-38788500

RESUMEN

For species identification analysis, methods based on deep learning are becoming prevalent due to their data-driven and task-oriented nature. The most commonly used convolutional neural network (CNN) model has been well applied in Raman spectra recognition. However, when faced with similar molecules or functional groups, the features of overlapping peaks and weak peaks may not be fully extracted using the CNN model, which can potentially hinder accurate species identification. Based on these practical challenges, the fusion of multi-modal data can effectively meet the comprehensive and accurate analysis of actual samples when compared with single-modal data. In this study, we propose a double-branch CNN model by integrating Raman and image multi-modal data, named SI-DBNet. In addition, we have developed a one-dimensional convolutional neural network combining dilated convolutions and efficient channel attention mechanisms for spectral branching. The effectiveness of the model has been demonstrated using the Grad-CAM method to visualize the key regions concerned by the model. When compared to single-modal and multi-modal classification methods, our SI-DBNet model achieved superior performance with a classification accuracy of 98.8%. The proposed method provided a new reference for species identification based on multi-modal data fusion.

17.
ISME J ; 18(1)2024 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-38365232

RESUMEN

Ammonia-oxidizing archaea (AOA) are among the most ubiquitous and abundant archaea on Earth, widely distributed in marine, terrestrial, and geothermal ecosystems. However, the genomic diversity, biogeography, and evolutionary process of AOA populations in subsurface environments are vastly understudied compared to those in marine and soil systems. Here, we report a novel AOA order Candidatus (Ca.) Nitrosomirales which forms a sister lineage to the thermophilic Ca. Nitrosocaldales. Metagenomic and 16S rRNA gene-read mapping demonstrates the abundant presence of Nitrosomirales AOA in various groundwater environments and their widespread distribution across a range of geothermal, terrestrial, and marine habitats. Terrestrial Nitrosomirales AOA show the genetic capacity of using formate as a source of reductant and using nitrate as an alternative electron acceptor. Nitrosomirales AOA appear to have acquired key metabolic genes and operons from other mesophilic populations via horizontal gene transfer, including genes encoding urease, nitrite reductase, and V-type ATPase. The additional metabolic versatility conferred by acquired functions may have facilitated their radiation into a variety of subsurface, marine, and soil environments. We also provide evidence that each of the four AOA orders spans both marine and terrestrial habitats, which suggests a more complex evolutionary history for major AOA lineages than previously proposed. Together, these findings establish a robust phylogenomic framework of AOA and provide new insights into the ecology and adaptation of this globally abundant functional guild.


Asunto(s)
Amoníaco , Archaea , Amoníaco/metabolismo , Ecosistema , ARN Ribosómico 16S/genética , ARN Ribosómico 16S/metabolismo , Oxidación-Reducción , Filogenia , Suelo , Microbiología del Suelo
18.
BMC Genomics ; 14: 534, 2013 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-23915326

RESUMEN

BACKGROUND: Microalgae are promising feedstock for production of lipids, sugars, bioactive compounds and in particular biofuels, yet development of sensitive and reliable phylotyping strategies for microalgae has been hindered by the paucity of phylogenetically closely-related finished genomes. RESULTS: Using the oleaginous eustigmatophyte Nannochloropsis as a model, we assessed current intragenus phylotyping strategies by producing the complete plastid (pt) and mitochondrial (mt) genomes of seven strains from six Nannochloropsis species. Genes on the pt and mt genomes have been highly conserved in content, size and order, strongly negatively selected and evolving at a rate 33% and 66% of nuclear genomes respectively. Pt genome diversification was driven by asymmetric evolution of two inverted repeats (IRa and IRb): psbV and clpC in IRb are highly conserved whereas their counterparts in IRa exhibit three lineage-associated types of structural polymorphism via duplication or disruption of whole or partial genes. In the mt genomes, however, a single evolution hotspot varies in copy-number of a 3.5 Kb-long, cox1-harboring repeat. The organelle markers (e.g., cox1, cox2, psbA, rbcL and rrn16_mt) and nuclear markers (e.g., ITS2 and 18S) that are widely used for phylogenetic analysis obtained a divergent phylogeny for the seven strains, largely due to low SNP density. A new strategy for intragenus phylotyping of microalgae was thus proposed that includes (i) twelve sequence markers that are of higher sensitivity than ITS2 for interspecies phylogenetic analysis, (ii) multi-locus sequence typing based on rps11_mt-nad4, rps3_mt and cox2-rrn16_mt for intraspecies phylogenetic reconstruction and (iii) several SSR loci for identification of strains within a given species. CONCLUSION: This first comprehensive dataset of organelle genomes for a microalgal genus enabled exhaustive assessment and searches of all candidate phylogenetic markers on the organelle genomes. A new strategy for intragenus phylotyping of microalgae was proposed which might be generally applicable to other microalgal genera and should serve as a valuable tool in the expanding algal biotechnology industry.


Asunto(s)
Variación Genética/genética , Genoma Mitocondrial/genética , Genómica , Microalgas/citología , Microalgas/genética , Filogenia , Plastidios/genética , Evolución Molecular , Marcadores Genéticos/genética , Datos de Secuencia Molecular , Polimorfismo Genético/genética
19.
Bioinformatics ; 28(19): 2493-501, 2012 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-22843983

RESUMEN

BACKGROUND: It has long been intriguing scientists to effectively compare different microbial communities (also referred as 'metagenomic samples' here) in a large scale: given a set of unknown samples, find similar metagenomic samples from a large repository and examine how similar these samples are. With the current metagenomic samples accumulated, it is possible to build a database of metagenomic samples of interests. Any metagenomic samples could then be searched against this database to find the most similar metagenomic sample(s). However, on one hand, current databases with a large number of metagenomic samples mostly serve as data repositories that offer few functionalities for analysis; and on the other hand, methods to measure the similarity of metagenomic data work well only for small set of samples by pairwise comparison. It is not yet clear, how to efficiently search for metagenomic samples against a large metagenomic database. RESULTS: In this study, we have proposed a novel method, Meta-Storms, that could systematically and efficiently organize and search metagenomic data. It includes the following components: (i) creating a database of metagenomic samples based on their taxonomical annotations, (ii) efficient indexing of samples in the database based on a hierarchical taxonomy indexing strategy, (iii) searching for a metagenomic sample against the database by a fast scoring function based on quantitative phylogeny and (iv) managing database by index export, index import, data insertion, data deletion and database merging. We have collected more than 1300 metagenomic data from the public domain and in-house facilities, and tested the Meta-Storms method on these datasets. Our experimental results show that Meta-Storms is capable of database creation and effective searching for a large number of metagenomic samples, and it could achieve similar accuracies compared with the current popular significance testing-based methods. CONCLUSION: Meta-Storms method would serve as a suitable database management and search system to quickly identify similar metagenomic samples from a large pool of samples. CONTACT: ningkang@qibebt.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Metagenómica/métodos , Algoritmos , Humanos , Metagenoma , Filogenia , Programas Informáticos
20.
Bioinformatics ; 28(5): 643-50, 2012 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-22253293

RESUMEN

MOTIVATION: A review of the available single nucleotide polymorphism (SNP) calling procedures for Illumina high-throughput sequencing (HTS) platform data reveals that most rely mainly on base-calling and mapping qualities as sources of error when calling SNPs. Thus, errors not involved in base-calling or alignment, such as those in genomic sample preparation, are not accounted for. RESULTS: A novel method of consensus and SNP calling, Genotype Model Selection (GeMS), is given which accounts for the errors that occur during the preparation of the genomic sample. Simulations and real data analyses indicate that GeMS has the best performance balance of sensitivity and positive predictive value among the tested SNP callers. AVAILABILITY: The GeMS package can be downloaded from https://sites.google.com/a/bioinformatics.ucr.edu/xinping-cui/home/software or http://computationalbioenergy.org/software.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Algoritmos , Genómica/métodos , Valor Predictivo de las Pruebas , Programas Informáticos , Thermoanaerobacter/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA