Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 91
Filtrar
Más filtros

Base de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 2024 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-38884260

RESUMEN

Horizontal gene transfer (HGT) phenomena pervade the gut microbiome and significantly impact human health. Yet, no current method can accurately identify complete HGT events, including the transferred sequence and the associated deletion and insertion breakpoints from shotgun metagenomic data. Here, we develop LocalHGT, which facilitates the reliable and swift detection of complete HGT events from shotgun metagenomic data, delivering an accuracy of 99.4%-verified by Nanopore data-across 200 gut microbiome samples, and achieving an average F1 score of 0.99 on 100 simulated data. LocalHGT enables a systematic characterization of HGT events within the human gut microbiome across 2098 samples, revealing that multiple recipient genome sites can become targets of a transferred sequence, microhomology is enriched in HGT breakpoint junctions (P-value = 3.3e-58), and HGTs can function as host-specific fingerprints indicated by the significantly higher HGT similarity of intra-personal temporal samples than inter-personal samples (P-value = 4.3e-303). Crucially, HGTs showed potential contributions to colorectal cancer (CRC) and acute diarrhoea, as evidenced by the enrichment of the butyrate metabolism pathway (P-value = 3.8e-17) and the shigellosis pathway (P-value = 5.9e-13) in the respective associated HGTs. Furthermore, differential HGTs demonstrated promise as biomarkers for predicting various diseases. Integrating HGTs into a CRC prediction model achieved an AUC of 0.87.

2.
Front Immunol ; 15: 1438587, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38895125

RESUMEN

[This corrects the article DOI: 10.3389/fimmu.2024.1368749.].

3.
Bioinformatics ; 40(4)2024 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-38603603

RESUMEN

MOTIVATION: Genome sequencing technologies reveal a huge amount of genomic sequences. Neural network-based methods can be prime candidates for retrieving insights from these sequences because of their applicability to large and diverse datasets. However, the highly variable lengths of genome sequences severely impair the presentation of sequences as input to the neural network. Genetic variations further complicate tasks that involve sequence comparison or alignment. RESULTS: Inspired by the theory and applications of "spaced seeds," we propose a graph representation of genome sequences called "gapped pattern graph." These graphs can be transformed through a Graph Convolutional Network to form lower-dimensional embeddings for downstream tasks. On the basis of the gapped pattern graphs, we implemented a neural network model and demonstrated its performance on diverse tasks involving microbe and mammalian genome data. Our method consistently outperformed all the other state-of-the-art methods across various metrics on all tasks, especially for the sequences with limited homology to the training data. In addition, our model was able to identify distinct gapped pattern signatures from the sequences. AVAILABILITY AND IMPLEMENTATION: The framework is available at https://github.com/deepomicslab/GCNFrame.

4.
Acta Pharm Sin B ; 14(4): 1814-1826, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38572113

RESUMEN

Efficient translation mediated by the 5' untranslated region (5' UTR) is essential for the robust efficacy of mRNA vaccines. However, the N1-methyl-pseudouridine (m1Ψ) modification of mRNA can impact the translation efficiency of the 5' UTR. We discovered that the optimal 5' UTR for m1Ψ-modified mRNA (m1Ψ-5' UTR) differs significantly from its unmodified counterpart, highlighting the need for a specialized tool for designing m1Ψ-5' UTRs rather than directly utilizing high-expression endogenous gene 5' UTRs. In response, we developed a novel machine learning-based tool, Smart5UTR, which employs a deep generative model to identify superior m1Ψ-5' UTRs in silico. The tailored loss function and network architecture enable Smart5UTR to overcome limitations inherent in existing models. As a result, Smart5UTR can successfully design superior 5' UTRs, greatly benefiting mRNA vaccine development. Notably, Smart5UTR-designed superior 5' UTRs significantly enhanced antibody titers induced by COVID-19 mRNA vaccines against the Delta and Omicron variants of SARS-CoV-2, surpassing the performance of vaccines using high-expression endogenous gene 5' UTRs.

5.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38426321

RESUMEN

The common loci represent a distinct set of the human genome sites that harbor genetic variants found in at least 1% of the population. Small somatic mutations occur at the common loci and non-common loci, i.e. csmVariants and ncsmVariants, are presumed with similar probabilities. However, our work revealed that within the coding region, common loci constituted only 1.03% of all loci, yet they accounted for 5.14% of TCGA somatic mutations. Furthermore, the small somatic mutation incidence rate at these common loci was 2.7 times that observed in the non-common. Notably, the csmVariants exhibited an impressive recurrent rate of 36.14%, which was 2.59 times of the ncsmVariants. The C-to-T transition at the CpG sites accounted for 32.41% of the csmVariants, which was 2.93 times for the ncsmVariants. Interestingly, the aging-related mutational signature contributed to 13.87% of the csmVariants, 5.5 times that of ncsmVariants. Moreover, 35.93% of the csmVariants contexts exhibited palindromic features, outperforming ncsmVariant contexts by 1.84 times. Notably, cancer patients with higher csmVariants rates had better progression-free survival. Furthermore, cancer patients with high-frequency csmVariants enriched with mismatch repair deficiency were also associated with better progression-free survival. The accumulation of csmVariants during cancerogenesis is a complex process influenced by various factors. These include the presence of a substantial percentage of palindromic sequences at csmVariants sites, the impact of aging and DNA mismatch repair deficiency. Together, these factors contribute to the higher somatic mutation incidence rates of common loci and the overall accumulation of csmVariants in cancer development.


Asunto(s)
Neoplasias Encefálicas , Neoplasias Colorrectales , Síndromes Neoplásicos Hereditarios , Humanos , Incidencia , Neoplasias Encefálicas/genética , Mutación
6.
Front Immunol ; 15: 1368749, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38524135

RESUMEN

Numerous studies have shown that immune checkpoint inhibitor (ICI) immunotherapy has great potential as a cancer treatment, leading to significant clinical improvements in numerous cases. However, it benefits a minority of patients, underscoring the importance of discovering reliable biomarkers that can be used to screen for potential beneficiaries and ultimately reduce the risk of overtreatment. Our comprehensive review focuses on the latest advancements in predictive biomarkers for ICI therapy, particularly emphasizing those that enhance the efficacy of programmed cell death protein 1 (PD-1)/programmed cell death-ligand 1 (PD-L1) inhibitors and cytotoxic T-lymphocyte antigen-4 (CTLA-4) inhibitors immunotherapies. We explore biomarkers derived from various sources, including tumor cells, the tumor immune microenvironment (TIME), body fluids, gut microbes, and metabolites. Among them, tumor cells-derived biomarkers include tumor mutational burden (TMB) biomarker, tumor neoantigen burden (TNB) biomarker, microsatellite instability (MSI) biomarker, PD-L1 expression biomarker, mutated gene biomarkers in pathways, and epigenetic biomarkers. TIME-derived biomarkers include immune landscape of TIME biomarkers, inhibitory checkpoints biomarkers, and immune repertoire biomarkers. We also discuss various techniques used to detect and assess these biomarkers, detailing their respective datasets, strengths, weaknesses, and evaluative metrics. Furthermore, we present a comprehensive review of computer models for predicting the response to ICI therapy. The computer models include knowledge-based mechanistic models and data-based machine learning (ML) models. Among the knowledge-based mechanistic models are pharmacokinetic/pharmacodynamic (PK/PD) models, partial differential equation (PDE) models, signal networks-based models, quantitative systems pharmacology (QSP) models, and agent-based models (ABMs). ML models include linear regression models, logistic regression models, support vector machine (SVM)/random forest/extra trees/k-nearest neighbors (KNN) models, artificial neural network (ANN) and deep learning models. Additionally, there are hybrid models of systems biology and ML. We summarized the details of these models, outlining the datasets they utilize, their evaluation methods/metrics, and their respective strengths and limitations. By summarizing the major advances in the research on predictive biomarkers and computer models for the therapeutic effect and clinical utility of tumor ICI, we aim to assist researchers in choosing appropriate biomarkers or computer models for research exploration and help clinicians conduct precision medicine by selecting the best biomarkers.


Asunto(s)
Antígeno B7-H1 , Neoplasias , Humanos , Antígeno B7-H1/metabolismo , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Biomarcadores de Tumor/genética , Inmunoterapia/métodos , Inhibidores de Puntos de Control Inmunológico/farmacología , Inhibidores de Puntos de Control Inmunológico/uso terapéutico , Microambiente Tumoral
7.
NPJ Biofilms Microbiomes ; 10(1): 26, 2024 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-38509123

RESUMEN

There is a deficiency in population-based studies investigating the impact of HPV infection on vaginal microenvironment, which influences the risk of persistent HPV infection. This prospective study aimed to unravel the dynamics of vaginal microbiota (VM) and vaginal metabolome in reaction to the changed state of HPV infection. Our results propose that the vaginal metabolome may be a superior indicator to VM when assessing the impact of altered HPV state on the vaginal microenvironment.


Asunto(s)
Microbiota , Infecciones por Papillomavirus , Femenino , Humanos , Estudios Prospectivos , ARN Ribosómico 16S , Metaboloma , Microbiota/fisiología
8.
Nucleic Acids Res ; 52(D1): D756-D761, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37904614

RESUMEN

Bacteriophages are viruses that infect bacteria or archaea. Understanding the diverse and intricate genomic architectures of phages is essential to study microbial ecosystems and develop phage therapy strategies. However, the existing phage databases are short of meticulous annotations. To this end, we propose PhageScope (https://phagescope.deepomics.org), an online phage database with comprehensive annotations. PhageScope harbors a collection of 873 718 phage sequences from various sources. Applying fifteen state-of-the-art tools to perform systematic annotations and analyses, PhageScope provides annotations on genome completeness, host range, lifestyle information, taxonomy classification, nine types of structural and functional genetic elements, and three types of comparative genomic studies for curated phages. Additionally, PhageScope incorporates automatic analyses and visualizations for curated and customized phages, serving as an efficient platform for phage study.


Asunto(s)
Bacteriófagos , Bases de Datos Genéticas , Bacterias/virología , Bacteriófagos/genética , Genoma Viral/genética , Genómica , Terapia de Fagos
9.
Front Oncol ; 13: 1290112, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38074680

RESUMEN

Given the shortage of cytologists, women in low-resource regions had inequitable access to cervical cytology which plays an pivotal role in cervical cancer screening. Emerging studies indicated the potential of AI-assisted system in promoting the implementation of cytology in resource-limited settings. However, there is a deficiency in evaluating the aid of AI in the improvement of cytologists' work efficiency. This study aimed to evaluate the feasibility of AI in excluding cytology-negative slides and improve the efficiency of slide interpretation. Well-annotated slides were included to develop the classification model that was applied to classify slides in the validation group. Nearly 70% of validation slides were reported as negative by the AI system, and none of these slides were diagnosed as high-grade lesions by expert cytologists. With the aid of AI system, the average of interpretation time for each slide decreased from 3 minutes to 30 seconds. These findings suggested the potential of AI-assisted system in accelerating slide interpretation in the large-scale cervical cancer screening.

10.
Cell Rep Methods ; 3(9): 100589, 2023 09 25.
Artículo en Inglés | MEDLINE | ID: mdl-37714157

RESUMEN

Reconstructing diploid sequences of human leukocyte antigen (HLA) genes, i.e., full-resolution HLA typing, from sequencing data is challenging. The high homogeneity across HLA genes and the high heterogeneity within HLA alleles complicate the identification of genomic source loci for sequencing reads. Here, we present SpecHLA, which utilizes fine-tuned reads binning and local assembly to achieve accurate full-resolution HLA typing. SpecHLA accepts sequencing data from paired-end, 10×-linked-reads, high-throughput chromosome conformation capture (Hi-C), Pacific Biosciences (PacBio), and Oxford Nanopore Technology (ONT). It can also incorporate pedigree data and genotype frequency to refine typing. In 32 Human Genome Structural Variation Consortium, Phase 2 (HGSVC2) samples, SpecHLA achieved 98.6% accuracy for G-group-resolution HLA typing, inferring entire HLA alleles with an average of three mismatches fewer, ten gaps fewer, and 590 bp less edit distance than HISAT-genotype per allele. Additionally, SpecHLA exhibited a 2-field typing accuracy of 98.6% in 875 real samples. Finally, SpecHLA detected HLA loss of heterozygosity with 99.7% specificity and 96.8% sensitivity in simulated samples of cancer cell lines.


Asunto(s)
Diploidia , Humanos , Alelos , Genotipo , Línea Celular , Prueba de Histocompatibilidad
11.
Nat Commun ; 14(1): 5528, 2023 09 08.
Artículo en Inglés | MEDLINE | ID: mdl-37684230

RESUMEN

Breakage-fusion-bridge (BFB) is a complex rearrangement that leads to tumor malignancy. Existing models for detecting BFBs rely on the ideal BFB hypothesis, ruling out the possibility of BFBs entangled with other structural variations, that is, complex BFBs. We propose an algorithm Ambigram to identify complex BFB and reconstruct the rearranged structure of the local genome during the cancer subclone evolution process. Ambigram handles data from short, linked, long, and single-cell sequences, and optical mapping technologies. Ambigram successfully deciphers the gold- or silver-standard complex BFBs against the state-of-the-art in multiple cancers. Ambigram dissects the intratumor heterogeneity of complex BFB events with single-cell reads from melanoma and gastric cancer. Furthermore, applying Ambigram to liver and cervical cancer data suggests that the BFB mechanism may mediate oncovirus integrations. BFB also exists in noncancer genomics. Investigating the complete human genome reference with Ambigram suggests that the BFB mechanism may be involved in two genome reorganizations of Homo Sapiens during evolution. Moreover, Ambigram discovers the signals of recurrent foldback inversions and complex BFBs in whole genome data from the 1000 genome project, and congenital heart diseases, respectively.


Asunto(s)
Melanoma , Neoplasias del Cuello Uterino , Humanos , Femenino , Genómica , Hígado , Genoma Humano/genética
13.
Nucleic Acids Res ; 51(15): e81, 2023 08 25.
Artículo en Inglés | MEDLINE | ID: mdl-37403780

RESUMEN

Single-cell sequencing technology enables the simultaneous capture of multiomic data from multiple cells. The captured data can be represented by tensors, i.e. the higher-rank matrices. However, the existing analysis tools often take the data as a collection of two-order matrices, renouncing the correspondences among the features. Consequently, we propose a probabilistic tensor decomposition framework, SCOIT, to extract embeddings from single-cell multiomic data. SCOIT incorporates various distributions, including Gaussian, Poisson, and negative binomial distributions, to deal with sparse, noisy, and heterogeneous single-cell data. Our framework can decompose a multiomic tensor into a cell embedding matrix, a gene embedding matrix, and an omic embedding matrix, allowing for various downstream analyses. We applied SCOIT to eight single-cell multiomic datasets from different sequencing protocols. With cell embeddings, SCOIT achieves superior performance for cell clustering compared to nine state-of-the-art tools under various metrics, demonstrating its ability to dissect cellular heterogeneity. With the gene embeddings, SCOIT enables cross-omics gene expression analysis and integrative gene regulatory network study. Furthermore, the embeddings allow cross-omics imputation simultaneously, outperforming current imputation methods with the Pearson correlation coefficient increased by 3.38-39.26%; moreover, SCOIT accommodates the scenario that subsets of the cells are with merely one omic profile available.


Asunto(s)
Benchmarking , Multiómica , Análisis por Conglomerados , Correlación de Datos , Citosol , Análisis de la Célula Individual
14.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37150761

RESUMEN

The specificity of a T-cell receptor (TCR) repertoire determines personalized immune capacity. Existing methods have modeled the qualitative aspects of TCR specificity, while the quantitative aspects remained unaddressed. We developed a package, TCRanno, to quantify the specificity of TCR repertoires. We created deep-learning-based, epitope-aware vector embeddings to infer individual TCR specificity. Then we aggregated clonotype frequencies of TCRs to obtain a quantitative profile of repertoire specificity at epitope, antigen and organism levels. Applying TCRanno to 4195 TCR repertoires revealed quantitative changes in repertoire specificity upon infections, autoimmunity and cancers. Specifically, TCRanno found cytomegalovirus-specific TCRs in seronegative healthy individuals, supporting the possibility of abortive infections. TCRanno discovered age-accumulated fraction of severe acute respiratory syndrome coronavirus 2 specific TCRs in pre-pandemic samples, which may explain the aggressive symptoms and age-related severity of coronavirus disease 2019. TCRanno also identified the encounter of Hepatitis B antigens as a potential trigger of systemic lupus erythematosus. TCRanno annotations showed capability in distinguishing TCR repertoires of healthy and cancers including melanoma, lung and breast cancers. TCRanno also demonstrated usefulness to single-cell TCRseq+gene expression data analyses by isolating T-cells with the specificity of interest.


Asunto(s)
Linfocitos T CD8-positivos , COVID-19 , Humanos , Linfocitos T CD8-positivos/metabolismo , COVID-19/genética , Receptores de Antígenos de Linfocitos T/genética , Epítopos , Citomegalovirus
16.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36892171

RESUMEN

The adaptive immune receptor repertoire (AIRR), consisting of T- and B-cell receptors, is the core component of the immune system. The AIRR sequencing is commonly used in cancer immunotherapy and minimal residual disease (MRD) detection of leukemia and lymphoma. The AIRR is captured by primers and sequenced to yield paired-end (PE) reads. The PE reads could be merged into one sequence by the overlapped region between them. However, the wide range of AIRR data raises the difficulty, so a special tool is required. We developed a software package for IMmune PE reads merger of sequencing data, named IMperm. We used the k-mer-and-vote strategy to pin down the overlapped region rapidly. IMperm could handle all types of PE reads, eliminate adapter contamination and successfully merge low-quality and minor/non-overlapping reads. Compared with existing tools, IMperm performed better in both simulated and sequencing data. Notably, IMperm was well suited to processing the data of MRD detection in leukemia and lymphoma and detected 19 novel MRD clones in 14 patients with leukemia from previously published data. Additionally, IMperm can handle PE reads from other sources, and we demonstrated its effectiveness on two genomic and one cell-free deoxyribonucleic acid datasets. IMperm is implemented in the C programming language and consumes little runtime and memory. It is freely available at https://github.com/zhangwei2015/IMperm.


Asunto(s)
Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN , Programas Informáticos , Genoma , Algoritmos
17.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36752378

RESUMEN

T-cell receptors (TCRs) play an essential role in the adaptive immune system. Probabilistic models for TCR repertoires can help decipher the underlying complex sequence patterns and provide novel insights into understanding the adaptive immune system. In this work, we develop TCRpeg, a deep autoregressive generative model to unravel the sequence patterns of TCR repertoires. TCRpeg largely outperforms state-of-the-art methods in estimating the probability distribution of a TCR repertoire, boosting the average accuracy from 0.672 to 0.906 measured by the Pearson correlation coefficient. Furthermore, with promising performance in probability inference, TCRpeg improves on a range of TCR-related tasks: profiling TCR repertoire probabilistically, classifying antigen-specific TCRs, validating previously discovered TCR motifs, generating novel TCRs and augmenting TCR data. Our results and analysis highlight the flexibility and capacity of TCRpeg to extract TCR sequence information, providing a novel approach for deciphering complex immunogenomic repertoires.


Asunto(s)
Modelos Estadísticos , Receptores de Antígenos de Linfocitos T , Receptores de Antígenos de Linfocitos T/genética
18.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36715274

RESUMEN

The advance in single-cell RNA-sequencing (scRNA-seq) sheds light on cell-specific transcriptomic studies of cell developments, complex diseases and cancers. Nevertheless, scRNA-seq techniques suffer from 'dropout' events, and imputation tools are proposed to address the sparsity. Here, rather than imputation, we propose a tool, SMURF, to extract the low-dimensional embeddings from cells and genes utilizing matrix factorization with a mixture of Poisson-Gamma divergent as objective while preserving self-consistency. SMURF exhibits feasible cell subpopulation discovery efficacy with obtained cell embeddings on replicated in silico and eight web lab scRNA datasets with ground truth cell types. Furthermore, SMURF can reduce the cell embedding to a 1D-oval space to recover the time course of cell cycle. SMURF can also serve as an imputation tool; the in silico data assessment shows that SMURF parades the most robust gene expression recovery power with low root mean square error and high Pearson correlation. Moreover, SMURF recovers the gene distribution for the WM989 Drop-seq data. SMURF is available at https://github.com/deepomicslab/SMURF.


Asunto(s)
Análisis de Expresión Génica de una Sola Célula , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica , Análisis por Conglomerados
19.
Nucleic Acids Res ; 51(2): e9, 2023 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-36373664

RESUMEN

Cells possess functional diversity hierarchically. However, most single-cell analyses neglect the nested structures while detecting and visualizing the functional diversity. Here, we incorporate cell hierarchy to study functional diversity at subpopulation, club (i.e., sub-subpopulation), and cell layers. Accordingly, we implement a package, SEAT, to construct cell hierarchies utilizing structure entropy by minimizing the global uncertainty in cell-cell graphs. With cell hierarchies, SEAT deciphers functional diversity in 36 datasets covering scRNA, scDNA, scATAC, and scRNA-scATAC multiome. First, SEAT finds optimal cell subpopulations with high clustering accuracy. It identifies cell types or fates from omics profiles and boosts accuracy from 0.34 to 1. Second, SEAT detects insightful functional diversity among cell clubs. The hierarchy of breast cancer cells reveals that the specific tumor cell club drives AREG-EGFT signaling. We identify a dense co-accessibility network of cis-regulatory elements specified by one cell club in GM12878. Third, the cell order from the hierarchy infers periodic pseudo-time of cells, improving accuracy from 0.79 to 0.89. Moreover, we incorporate cell hierarchy layers as prior knowledge to refine nonlinear dimension reduction, enabling us to visualize hierarchical cell layouts in low-dimensional space.


Asunto(s)
Análisis por Conglomerados , Análisis de la Célula Individual , ARN Citoplasmático Pequeño , Análisis de la Célula Individual/métodos , Incertidumbre
20.
STAR Protoc ; 4(1): 101928, 2023 03 17.
Artículo en Inglés | MEDLINE | ID: mdl-36520631

RESUMEN

We describe a protocol to integrate genome variation data from different datasets and explore the population structure and migration history of human populations. This protocol provides semi-automated scripts to perform and visualize the effect of variant filtering strategy on eliminating batch effects, principal component analysis, ancestry component analysis, historical population effective size inference, and migration and isolation analysis based on independent biallelic SNPs, genotype likelihoods, and haplotypes. The protocol can be adapted to variation data from other sources. For complete details on the use and execution of this protocol, please refer to Zhang et al. (2022).1.


Asunto(s)
Genoma Humano , Polimorfismo de Nucleótido Simple , Humanos , Genoma Humano/genética , Genotipo , Polimorfismo de Nucleótido Simple/genética , Probabilidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA