Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 162
Filtrar
1.
NAR Genom Bioinform ; 6(2): lqae044, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38711860

RESUMEN

Sequence classification facilitates a fundamental understanding of the structure of microbial communities. Binary metagenomic sequence classifiers are insufficient because environmental metagenomes are typically derived from multiple sequence sources. Here we introduce a deep-learning based sequence classifier, DeepMicroClass, that classifies metagenomic contigs into five sequence classes, i.e. viruses infecting prokaryotic or eukaryotic hosts, eukaryotic or prokaryotic chromosomes, and prokaryotic plasmids. DeepMicroClass achieved high performance for all sequence classes at various tested sequence lengths ranging from 500 bp to 100 kbps. By benchmarking on a synthetic dataset with variable sequence class composition, we showed that DeepMicroClass obtained better performance for eukaryotic, plasmid and viral contig classification than other state-of-the-art predictors. DeepMicroClass achieved comparable performance on viral sequence classification with geNomad and VirSorter2 when benchmarked on the CAMI II marine dataset. Using a coastal daily time-series metagenomic dataset as a case study, we showed that microbial eukaryotes and prokaryotic viruses are integral to microbial communities. By analyzing monthly metagenomes collected at HOT and BATS, we found relatively higher viral read proportions in the subsurface layer in late summer, consistent with the seasonal viral infection patterns prevalent in these areas. We expect DeepMicroClass will promote metagenomic studies of under-appreciated sequence types.

3.
Sci Rep ; 14(1): 7024, 2024 03 25.
Artículo en Inglés | MEDLINE | ID: mdl-38528097

RESUMEN

The human microbiome, comprising microorganisms residing within and on the human body, plays a crucial role in various physiological processes and has been linked to numerous diseases. To analyze microbiome data, it is essential to account for inherent heterogeneity and variability across samples. Normalization methods have been proposed to mitigate these variations and enhance comparability. However, the performance of these methods in predicting binary phenotypes remains understudied. This study systematically evaluates different normalization methods in microbiome data analysis and their impact on disease prediction. Our findings highlight the strengths and limitations of scaling, compositional data analysis, transformation, and batch correction methods. Scaling methods like TMM show consistent performance, while compositional data analysis methods exhibit mixed results. Transformation methods, such as Blom and NPN, demonstrate promise in capturing complex associations. Batch correction methods, including BMC and Limma, consistently outperform other approaches. However, the influence of normalization methods is constrained by population effects, disease effects, and batch effects. These results provide insights for selecting appropriate normalization approaches in microbiome research, improving predictive models, and advancing personalized medicine. Future research should explore larger and more diverse datasets and develop tailored normalization strategies for microbiome data analysis.


Asunto(s)
Microbiota , Humanos , Microbiota/genética , Metagenoma , Metagenómica , Proyectos de Investigación , Fenotipo
4.
iScience ; 27(3): 109041, 2024 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-38361626

RESUMEN

Patients with neurodegenerative diseases exhibit diminished basal forebrain (BF) volume compared to healthy individuals. However, it's uncertain whether this difference is consistent between sexes. It has been reported that BF volume moderately atrophies during aging, but the effect of sex on BF volume changes during the normal aging process remains unclear. In the cross-sectional study, we observed a significant reduction in BF volume in patients with mild cognitive impairment (MCI) and Alzheimer's disease (AD) compared to Healthy Controls (HCs), especially in the Ch4 subregion. Notably, significant differences in BF volume between MCI and HCs were observed solely in the female group. Additionally, we identified asymmetrical atrophy in the left and right Ch4 subregions in female patients with AD. In the longitudinal analysis, we found that aging seemed to have a minimal impact on BF volume in males. Our study highlights the importance of considering sex as a research variable in brain science.

5.
Nat Commun ; 15(1): 585, 2024 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-38233391

RESUMEN

Contig binning plays a crucial role in metagenomic data analysis by grouping contigs from the same or closely related genomes. However, existing binning methods face challenges in practical applications due to the diversity of data types and the difficulties in efficiently integrating heterogeneous information. Here, we introduce COMEBin, a binning method based on contrastive multi-view representation learning. COMEBin utilizes data augmentation to generate multiple fragments (views) of each contig and obtains high-quality embeddings of heterogeneous features (sequence coverage and k-mer distribution) through contrastive learning. Experimental results on multiple simulated and real datasets demonstrate that COMEBin outperforms state-of-the-art binning methods, particularly in recovering near-complete genomes from real environmental samples. COMEBin outperforms other binning methods remarkably when integrated into metagenomic analysis pipelines, including the recovery of potentially pathogenic antibiotic-resistant bacteria (PARB) and moderate or higher quality bins containing potential biosynthetic gene clusters (BGCs).


Asunto(s)
Metagenoma , Metagenómica , Metagenoma/genética , Metagenómica/métodos , Algoritmos , Análisis de Secuencia de ADN/métodos
6.
Cereb Cortex ; 34(1)2024 01 14.
Artículo en Inglés | MEDLINE | ID: mdl-38037843

RESUMEN

Human brain structure shows heterogeneous patterns of change across adults aging and is associated with cognition. However, the relationship between cortical structural changes during aging and gene transcription signatures remains unclear. Here, using structural magnetic resonance imaging data of two separate cohorts of healthy participants from the Cambridge Centre for Aging and Neuroscience (n = 454, 18-87 years) and Dallas Lifespan Brain Study (n = 304, 20-89 years) and a transcriptome dataset, we investigated the link between cortical morphometric similarity network and brain-wide gene transcription. In two cohorts, we found reproducible morphometric similarity network change patterns of decreased morphological similarity with age in cognitive related areas (mainly located in superior frontal and temporal cortices), and increased morphological similarity in sensorimotor related areas (postcentral and lateral occipital cortices). Changes in morphometric similarity network showed significant spatial correlation with the expression of age-related genes that enriched to synaptic-related biological processes, synaptic abnormalities likely accounting for cognitive decline. Transcription changes in astrocytes, microglia, and neuronal cells interpreted most of the age-related morphometric similarity network changes, which suggest potential intervention and therapeutic targets for cognitive decline. Taken together, by linking gene transcription signatures to cortical morphometric similarity network, our findings might provide molecular and cellular substrates for cortical structural changes related to cognitive decline across adults aging.


Asunto(s)
Envejecimiento , Encéfalo , Adulto , Humanos , Encéfalo/fisiología , Envejecimiento/fisiología , Cognición/fisiología , Lóbulo Temporal , Imagen por Resonancia Magnética/métodos
7.
Cereb Cortex ; 34(1)2024 01 14.
Artículo en Inglés | MEDLINE | ID: mdl-38044469

RESUMEN

Brain function changes affect cognitive functions in older adults, yet the relationship between cognition and the dynamic changes of brain networks during naturalistic stimulation is not clear. Here, we recruited the young, middle-aged and older groups from the Cambridge Center for Aging and Neuroscience to investigate the relationship between dynamic metrics of brain networks and cognition using functional magnetic resonance imaging data during movie-watching. We found six reliable co-activation pattern (CAP) states of brain networks grouped into three pairs with opposite activation patterns in three age groups. Compared with young and middle-aged adults, older adults dwelled shorter time in CAP state 4 with deactivated default mode network (DMN) and activated salience, frontoparietal and dorsal-attention networks (DAN), and longer time in state 6 with deactivated DMN and activated DAN and visual network, suggesting altered dynamic interaction between DMN and other brain networks might contribute to cognitive decline in older adults. Meanwhile, older adults showed easier transfer from state 6 to state 3 (activated DMN and deactivated sensorimotor network), suggesting that the fragile antagonism between DMN and other cognitive networks might contribute to cognitive decline in older adults. Our findings provided novel insights into aberrant brain network dynamics associated with cognitive decline.


Asunto(s)
Encéfalo , Imagen por Resonancia Magnética , Imagen por Resonancia Magnética/métodos , Encéfalo/diagnóstico por imagen , Encéfalo/fisiología , Cognición/fisiología , Mapeo Encefálico , Red Nerviosa/diagnóstico por imagen , Red Nerviosa/fisiología
8.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37930023

RESUMEN

Local associations refer to spatial-temporal correlations that emerge from the biological realm, such as time-dependent gene co-expression or seasonal interactions between microbes. One can reveal the intricate dynamics and inherent interactions of biological systems by examining the biological time series data for these associations. To accomplish this goal, local similarity analysis algorithms and statistical methods that facilitate the local alignment of time series and assess the significance of the resulting alignments have been developed. Although these algorithms were initially devised for gene expression analysis from microarrays, they have been adapted and accelerated for multi-omics next generation sequencing datasets, achieving high scientific impact. In this review, we present an overview of the historical developments and recent advances for local similarity analysis algorithms, their statistical properties, and real applications in analyzing biological time series data. The benchmark data and analysis scripts used in this review are freely available at http://github.com/labxscut/lsareview.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Factores de Tiempo , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Benchmarking
9.
Nat Commun ; 14(1): 6231, 2023 10 06.
Artículo en Inglés | MEDLINE | ID: mdl-37802989

RESUMEN

Metagenomic Hi-C (metaHi-C) can identify contig-to-contig relationships with respect to their proximity within the same physical cell. Shotgun libraries in metaHi-C experiments can be constructed by next-generation sequencing (short-read metaHi-C) or more recent third-generation sequencing (long-read metaHi-C). However, all existing metaHi-C analysis methods are developed and benchmarked on short-read metaHi-C datasets and there exists much room for improvement in terms of more scalable and stable analyses, especially for long-read metaHi-C data. Here we report MetaCC, an efficient and integrative framework for analyzing both short-read and long-read metaHi-C datasets. MetaCC outperforms existing methods on normalization and binning. In particular, the MetaCC normalization module, named NormCC, is more than 3000 times faster than the current state-of-the-art method HiCzin on a complex wastewater dataset. When applied to one sheep gut long-read metaHi-C dataset, MetaCC binning module can retrieve 709 high-quality genomes with the largest species diversity using one single sample, including an expansion of five uncultured members from the order Erysipelotrichales, and is the only binner that can recover the genome of one important species Bacteroides vulgatus. Further plasmid analyses reveal that MetaCC binning is able to capture multi-copy plasmids.


Asunto(s)
Algoritmos , Metagenoma , Animales , Ovinos , Análisis de Secuencia de ADN/métodos , Metagenoma/genética , Metagenómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento
10.
PLoS Comput Biol ; 19(10): e1010608, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37844077

RESUMEN

Heterogeneity in different genomic studies compromises the performance of machine learning models in cross-study phenotype predictions. Overcoming heterogeneity when incorporating different studies in terms of phenotype prediction is a challenging and critical step for developing machine learning algorithms with reproducible prediction performance on independent datasets. We investigated the best approaches to integrate different studies of the same type of omics data under a variety of different heterogeneities. We developed a comprehensive workflow to simulate a variety of different types of heterogeneity and evaluate the performances of different integration methods together with batch normalization by using ComBat. We also demonstrated the results through realistic applications on six colorectal cancer (CRC) metagenomic studies and six tuberculosis (TB) gene expression studies, respectively. We showed that heterogeneity in different genomic studies can markedly negatively impact the machine learning classifier's reproducibility. ComBat normalization improved the prediction performance of machine learning classifier when heterogeneous populations are present, and could successfully remove batch effects within the same population. We also showed that the machine learning classifier's prediction accuracy can be markedly decreased as the underlying disease model became more different in training and test populations. Comparing different merging and integration methods, we found that merging and integration methods can outperform each other in different scenarios. In the realistic applications, we observed that the prediction accuracy improved when applying ComBat normalization with merging or integration methods in both CRC and TB studies. We illustrated that batch normalization is essential for mitigating both population differences of different studies and batch effects. We also showed that both merging strategy and integration methods can achieve good performances when combined with batch normalization. In addition, we explored the potential of boosting phenotype prediction performance by rank aggregation methods and showed that rank aggregation methods had similar performance as other ensemble learning approaches.


Asunto(s)
Algoritmos , Aprendizaje Automático , Reproducibilidad de los Resultados , Genómica , Fenotipo
11.
Cereb Cortex ; 33(13): 8645-8653, 2023 06 20.
Artículo en Inglés | MEDLINE | ID: mdl-37143182

RESUMEN

Sex differences in episodic memory (EM), remembering past events based on when and where they occurred, have been reported, but the neural mechanisms are unclear. T1-weighted images of 111 females and 61 males were acquired from the Dallas Lifespan Brain Study. Using surface-based morphometry and structural covariance (SC) analysis, we constructed structural covariance networks (SCN) based on cortical volume, and the global efficiency (Eglob) was computed to characterize network integration. The relationship between SCN and EM was examined by SC analysis among the top-n brain regions that were most relevant to EM performance. The number of SC connections (females: 3306; males: 437, P = 0.0212) and Eglob (females: 0.1845; males: 0.0417, P = 0.0408) of SCN in females were higher than those in males. The top-n brain regions with the strongest SC in females were located in auditory network, cingulo-opercular network (CON), and default mode network (DMN), and in males, they were located in frontoparietal network, CON, and DMN. These results confirmed that the Eglob of SCN in females was higher than males, sex differences in EM performance might be related to the differences in network-level integration. Our study highlights the importance of sex as a research variable in brain science.


Asunto(s)
Memoria Episódica , Humanos , Masculino , Femenino , Caracteres Sexuales , Encéfalo , Imagen por Resonancia Magnética , Mapeo Encefálico
12.
Genome Biol ; 24(1): 1, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36609515

RESUMEN

Binning aims to recover microbial genomes from metagenomic data. For complex metagenomic communities, the available binning methods are far from satisfactory, which usually do not fully use different types of features and important biological knowledge. We developed a novel ensemble binner, MetaBinner, which generates component results with multiple types of features by k-means and uses single-copy gene information for initialization. It then employs a two-stage ensemble strategy based on single-copy genes to integrate the component results efficiently and effectively. Extensive experimental results on three large-scale simulated datasets and one real-world dataset demonstrate that MetaBinner outperforms the state-of-the-art binners significantly.


Asunto(s)
Algoritmos , Microbiota , Microbiota/genética , Metagenoma , Genoma Microbiano , Metagenómica/métodos , Análisis de Secuencia de ADN
13.
Nat Commun ; 14(1): 502, 2023 01 31.
Artículo en Inglés | MEDLINE | ID: mdl-36720887

RESUMEN

The introduction of high-throughput chromosome conformation capture (Hi-C) into metagenomics enables reconstructing high-quality metagenome-assembled genomes (MAGs) from microbial communities. Despite recent advances in recovering eukaryotic, bacterial, and archaeal genomes using Hi-C contact maps, few of Hi-C-based methods are designed to retrieve viral genomes. Here we introduce ViralCC, a publicly available tool to recover complete viral genomes and detect virus-host pairs using Hi-C data. Compared to other Hi-C-based methods, ViralCC leverages the virus-host proximity structure as a complementary information source for the Hi-C interactions. Using mock and real metagenomic Hi-C datasets from several different microbial ecosystems, including the human gut, cow fecal, and wastewater, we demonstrate that ViralCC outperforms existing Hi-C-based binning methods as well as state-of-the-art tools specifically dedicated to metagenomic viral binning. ViralCC can also reveal the taxonomic structure of viruses and virus-host pairs in microbial communities. When applied to a real wastewater metagenomic Hi-C dataset, ViralCC constructs a phage-host network, which is further validated using CRISPR spacer analyses. ViralCC is an open-source pipeline available at https://github.com/dyxstat/ViralCC .


Asunto(s)
Bacteriófagos , Microbiota , Animales , Bovinos , Femenino , Humanos , Metagenoma/genética , Metagenómica , Aguas Residuales , Genoma Viral/genética , Bacteriófagos/genética
14.
Nat Commun ; 13(1): 5566, 2022 09 29.
Artículo en Inglés | MEDLINE | ID: mdl-36175411

RESUMEN

Early cancer detection by cell-free DNA faces multiple challenges: low fraction of tumor cell-free DNA, molecular heterogeneity of cancer, and sample sizes that are not sufficient to reflect diverse patient populations. Here, we develop a cancer detection approach to address these challenges. It consists of an assay, cfMethyl-Seq, for cost-effective sequencing of the cell-free DNA methylome (with > 12-fold enrichment over whole genome bisulfite sequencing in CpG islands), and a computational method to extract methylation information and diagnose patients. Applying our approach to 408 colon, liver, lung, and stomach cancer patients and controls, at 97.9% specificity we achieve 80.7% and 74.5% sensitivity in detecting all-stage and early-stage cancer, and 89.1% and 85.0% accuracy for locating tissue-of-origin of all-stage and early-stage cancer, respectively. Our approach cost-effectively retains methylome profiles of cancer abnormalities, allowing us to learn new features and expand to other cancer types as training cohorts grow.


Asunto(s)
Ácidos Nucleicos Libres de Células , Neoplasias Gástricas , Ácidos Nucleicos Libres de Células/genética , Análisis Costo-Beneficio , Detección Precoz del Cáncer , Epigenoma , Humanos , Neoplasias Gástricas/diagnóstico , Neoplasias Gástricas/genética
15.
Front Cell Infect Microbiol ; 12: 918010, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35782128

RESUMEN

The association of colorectal cancer (CRC) and the human gut microbiome dysbiosis has been the focus of several studies in the past. Many bacterial taxa have been shown to have differential abundance among CRC patients compared to healthy controls. However, the relationship between CRC and non-bacterial gut microbiome such as the gut virome is under-studied and not well understood. In this study we conducted a comprehensive analysis of the association of viral abundances with CRC using metagenomic shotgun sequencing data of 462 CRC subjects and 449 healthy controls from 7 studies performed in 8 different countries. Despite the high heterogeneity, our results showed that the virome alpha diversity was consistently higher in CRC patients than in healthy controls (p-value <0.001). This finding is in sharp contrast to previous reports of low alpha diversity of prokaryotes in CRC compared to healthy controls. In addition to the previously known association of Podoviridae, Siphoviridae and Myoviridae with CRC, we further demonstrate that Herelleviridae, a newly constructed viral family, is significantly depleted in CRC subjects. Our interkingdom association analysis reveals a less intertwined correlation between the gut virome and bacteriome in CRC compared to healthy controls. Furthermore, we show that the viral abundance profiles can be used to accurately predict CRC disease status (AUROC >0.8) in both within-study and cross-study settings. The combination of training sets resulted in rather generalized and accurate prediction models. Our study clearly shows that subjects with colorectal cancer harbor a distinct human gut virome profile which may have an important role in this disease.


Asunto(s)
Bacteriófagos , Neoplasias Colorrectales , Siphoviridae , Bacteriófagos/genética , Humanos , Metagenoma , Metagenómica
16.
PLoS Comput Biol ; 18(7): e1010184, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35830390

RESUMEN

Confounding factors exist widely in various biological data owing to technical variations, population structures and experimental conditions. Such factors may mask the true signals and lead to spurious associations in the respective biological data, making it necessary to adjust confounding factors accordingly. However, existing confounder correction methods were mainly developed based on the original data or the pairwise Euclidean distance, either one of which is inadequate for analyzing different types of data, such as sequencing data. In this work, we proposed a method called Adjustment for Confounding factors using Principal Coordinate Analysis, or AC-PCoA, which reduces data dimension and extracts the information from different distance measures using principal coordinate analysis, and adjusts confounding factors across multiple datasets by minimizing the associations between lower-dimensional representations and confounding variables. Application of the proposed method was further extended to classification and prediction. We demonstrated the efficacy of AC-PCoA on three simulated datasets and five real datasets. Compared to the existing methods, AC-PCoA shows better results in visualization, statistical testing, clustering, and classification.


Asunto(s)
Proyectos de Investigación , Factores de Confusión Epidemiológicos
17.
Cancers (Basel) ; 14(12)2022 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-35740540

RESUMEN

Currently, most neuroblastoma patients are treated according to the Children's Oncology Group (COG) risk group assignment; however, neuroblastoma's heterogeneity renders only a few predictors for treatment response, resulting in excessive treatment. Here, we sought to couple COG risk classification with tumor intracellular microbiome, which is part of the molecular signature of a tumor. We determine that an intra-tumor microbial gene abundance score, namely M-score, separates the high COG-risk patients into two subpopulations (Mhigh and Mlow) with higher accuracy in risk stratification than the current COG risk assessment, thus sparing a subset of high COG-risk patients from being subjected to traditional high-risk therapies. Mechanistically, the classification power of M-scores implies the effect of CREB over-activation, which may influence the critical genes involved in cellular proliferation, anti-apoptosis, and angiogenesis, affecting tumor cell proliferation survival and metastasis. Thus, intracellular microbiota abundance in neuroblastoma regulates intracellular signals to affect patients' survival.

18.
Bioinformatics ; 38(Suppl 1): i45-i52, 2022 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-35758806

RESUMEN

MOTIVATION: Phage-host associations play important roles in microbial communities. But in natural communities, as opposed to culture-based lab studies where phages are discovered and characterized metagenomically, their hosts are generally not known. Several programs have been developed for predicting which phage infects which host based on various sequence similarity measures or machine learning approaches. These are often based on whole viral and host genomes, but in metagenomics-based studies, we rarely have whole genomes but rather must rely on contigs that are sometimes as short as hundreds of bp long. Therefore, we need programs that predict hosts of phage contigs on the basis of these short contigs. Although most existing programs can be applied to metagenomic datasets for these predictions, their accuracies are generally low. Here, we develop ContigNet, a convolutional neural network-based model capable of predicting phage-host matches based on relatively short contigs, and compare it to previously published VirHostMatcher (VHM) and WIsH. RESULTS: On the validation set, ContigNet achieves 72-85% area under the receiver operating characteristic curve (AUROC) scores, compared to the maximum of 68% by VHM or WIsH for contigs of lengths between 200 bps to 50 kbps. We also apply the model to the Metagenomic Gut Virus (MGV) catalogue, a dataset containing a wide range of draft genomes from metagenomic samples and achieve 60-70% AUROC scores compared to that of VHM and WIsH of 52%. Surprisingly, ContigNet can also be used to predict plasmid-host contig associations with high accuracy, indicating a similar genetic exchange between mobile genetic elements and their hosts. AVAILABILITY AND IMPLEMENTATION: The source code of ContigNet and related datasets can be downloaded from https://github.com/tianqitang1/ContigNet.


Asunto(s)
Bacteriófagos , Bacterias/genética , Bacteriófagos/genética , Metagenoma , Metagenómica , Redes Neurales de la Computación
19.
J Comput Biol ; 29(7): 601-615, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35727100

RESUMEN

On the occasion of Dr. Michael Waterman's 80th birthday, we review his major contributions to the field of computational biology and bioinformatics including the famous Smith-Waterman algorithm for sequence alignment, the probability and statistics theory related to sequence alignment, algorithms for sequence assembly, the Lander-Waterman model for genome physical mapping, combinatorics and predictions of ribonucleic acid structures, word counting statistics in molecular sequences, alignment-free sequence comparison, and algorithms for haplotype block partition and tagSNP selection related to the International HapMap Project. His books Introduction to Computational Biology: Maps, Sequences and Genomes for graduate students and Computational Genome Analysis: An Introduction geared toward undergraduate students played key roles in computational biology and bioinformatics education. We also highlight his efforts of building the computational biology and bioinformatics community as the founding editor of the Journal of Computational Biology and a founding member of the International Conference on Research in Computational Molecular Biology (RECOMB).


Asunto(s)
Algoritmos , Biología Computacional , Genoma , Humanos , Alineación de Secuencia
20.
Virol J ; 19(1): 114, 2022 06 28.
Artículo en Inglés | MEDLINE | ID: mdl-35765099

RESUMEN

BACKGROUND: Chronic infection with hepatitis B virus (HBV) has been proved highly associated with the development of hepatocellular carcinoma (HCC). AIMS: The purpose of the study is to investigate the association between HBV preS region quasispecies and HCC development, as well as to develop HCC diagnosis model using HBV preS region quasispecies. METHODS: A total of 104 chronic hepatitis B (CHB) patients and 117 HBV-related HCC patients were enrolled. HBV preS region was sequenced using next generation sequencing (NGS) and the nucleotide entropy was calculated for quasispecies evaluation. Sparse logistic regression (SLR) was used to predict HCC development and prediction performances were evaluated using receiver operating characteristic curves. RESULTS: Entropy of HBV preS1, preS2 regions and several nucleotide points showed significant divergence between CHB and HCC patients. Using SLR, the classification of HCC/CHB groups achieved a mean area under the receiver operating characteristic curve (AUC) of 0.883 in the training data and 0.795 in the test data. The prediction model was also validated by a completely independent dataset from Hong Kong. The 10 selected nucleotide positions showed significantly different entropy between CHB and HCC patients. The HBV quasispecies also classified three clinical parameters, including HBeAg, HBVDNA, and Alkaline phosphatase (ALP) with the AUC value greater than 0.6 in the test data. CONCLUSIONS: Using NGS and SLR, the association between HBV preS region nucleotide entropy and HCC development was validated in our study and this could promote the understanding of HCC progression mechanism.


Asunto(s)
Carcinoma Hepatocelular , Neoplasias Hepáticas , Antígenos de Superficie de la Hepatitis B/genética , Virus de la Hepatitis B/genética , Humanos , Modelos Logísticos , Nucleótidos , Cuasiespecies
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...