Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 74
Filtrar
1.
Patterns (N Y) ; 5(8): 101022, 2024 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-39233694

RESUMEN

A vast amount of single-cell RNA sequencing (SC) data have been accumulated via various studies and consortiums, but the lack of spatial information limits its analysis of complex biological activities. To bridge this gap, we introduce CellContrast, a computational method for reconstructing spatial relationships among SC cells from spatial transcriptomics (ST) reference. By adopting a contrastive learning framework and training with ST data, CellContrast projects gene expressions into a hidden space where proximate cells share similar representation values. We performed extensive benchmarking on diverse platforms, including SeqFISH, Stereo-seq, 10X Visium, and MERSCOPE, on mouse embryo and human breast cells. The results reveal that CellContrast substantially outperforms other related methods, facilitating accurate spatial reconstruction of SC. We further demonstrate CellContrast's utility by applying it to cell-type co-localization and cell-cell communication analysis with real-world SC samples, proving the recovered cell locations empower more discoveries and mitigate potential false positives.

2.
bioRxiv ; 2024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-39131276

RESUMEN

Transcriptional regulation, critical for cellular differentiation and adaptation to environmental changes, involves coordinated interactions among DNA sequences, regulatory proteins, and chromatin architecture. Despite extensive data from consortia like ENCODE, understanding the dynamics of cis-regulatory elements (CREs) in gene expression remains challenging. Deep learning is a powerful tool for learning gene expression and epigenomic signals from DNA sequences, exhibiting superior performance compared to conventional machine learning approaches. However, even the most advanced deep learning-based methods may fall short in capturing the regulatory effects of distal elements such as enhancers, limiting their predictive accuracy. In addition, these methods may require significant resources to train or to adapt to newly generated data. To address these challenges, we present EPInformer, a scalable deep-learning framework for predicting gene expression by integrating promoter-enhancer interactions with their sequences, epigenomic signals, and chromatin contacts. Our model outperforms existing gene expression prediction models in rigorous cross-chromosome validation, accurately recapitulates enhancer-gene interactions validated by CRISPR perturbation experiments, and identifies crucial transcription factor motifs within regulatory sequences. EPInformer is available as open-source software at https://github.com/pinellolab/EPInformer.

3.
Int J Surg ; 2024 Aug 14.
Artículo en Inglés | MEDLINE | ID: mdl-39166955

RESUMEN

BACKGROUND: Irritable bowel syndrome (IBS) significantly impacts individuals due to its prevalence and negative effect on quality of life. Current genome-wide association studies (GWAS) have only identified a small number of crucial single nucleotide polymorphisms (SNPs), not fully elucidating IBS's pathogenesis. OBJECTIVE: To identify genomic loci at which common genetic variation influence IBS susceptibility. METHODS: Combining independent cohorts that in total comprise 65,840 cases of IBS and 788,652 controls, we performed a meta-analysis of genome-wide association studies (GWAS) of IBS. We also carried out gene mapping and pathway enrichment to gain insights into the underlying genes and pathways through which the associated loci contribute to disease susceptibility. Furthermore, we performed transcriptome analysis to deepen our understanding. IBS risk models were developed by combining clinical/lifestyle risk factors with polygenic risk scores (PRS) derived from the GWAS meta-analysis. We detect the phenotype association for IBS utilizing PRS-based phenome-wide association (PheWAS) analyses, linkage disequilibrium score regression, and Mendelian randomization. RESULTS: The GWAS meta-analysis identified 10 IBS risk loci, seven of which were novel (rs12755507, rs34209273, rs34365748, rs67427799, rs2587363, rs13321176, rs1546559). Multiple methods identified nine promising IBS candidate gene (PRRC2A, COP1, CADM2, LRP1B, SUGT1, MED12L, P2RY14, PHF2, SHISA6) at 10 GWAS loci. Transcriptome validation also revealed differential expression of these genes. Phenome-wide associations between PRS-IBS and nine traits (neuroticism, diaphragmatic hernia, asthma, diverticulosis, cholelithiasis, depression, insomnia, COPD, and BMI) were identified. The six diseases (asthma, diaphragmatic hernia, diverticulosis, insomnia major depressive disorder and neuroticism) were found to show genetic association with IBS and only major depressive disorder and neuroticism were found to show causality with IBS. CONCLUSION: We identified seven novel risk loci for IBS and highlight the substantial influence on genetic risk harboured. Our findings offer novel insights into aetiology and phenotypic association of IBS and lay the foundation for therapeutic targets and interventional strategies.

4.
Cell Biosci ; 14(1): 101, 2024 Aug 02.
Artículo en Inglés | MEDLINE | ID: mdl-39095802

RESUMEN

BACKGROUND: COVID-19 can cause cardiac complications and the latter are associated with poor prognosis and increased mortality. SARS-CoV-2 variants differ in their infectivity and pathogenicity, but how they affect cardiomyocytes (CMs) is unclear. METHODS: The effects of SARS-CoV-2 variants were investigated using human induced pluripotent stem cell-derived (hiPSC-) CMs in vitro and Golden Syrian hamsters in vivo. RESULTS: Different variants exhibited distinct tropism, mechanism of viral entry and pathology in the heart. Omicron BA.2 most efficiently infected and injured CMs in vitro and in vivo, and induced expression changes consistent with increased cardiac dysfunction, compared to other variants tested. Bioinformatics and upstream regulator analyses identified transcription factors and network predicted to control the unique transcriptome of Omicron BA.2 infected CMs. Increased infectivity of Omicron BA.2 is attributed to its ability to infect via endocytosis, independently of TMPRSS2, which is absent in CMs. CONCLUSIONS: In this study, we reveal previously unknown differences in how different SARS-CoV-2 variants affect CMs. Omicron BA.2, which is generally thought to cause mild disease, can damage CMs in vitro and in vivo. Our study highlights the need for further investigations to define the pathogenesis of cardiac complications arising from different SARS-CoV-2 variants.

5.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-39038939

RESUMEN

Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder for which current treatments are limited and drug development costs are prohibitive. Identifying drug targets for ASD is crucial for the development of targeted therapies. Summary-level data of expression quantitative trait loci obtained from GTEx, protein quantitative trait loci data from the ROSMAP project, and two ASD genome-wide association studies datasets were utilized for discovery and replication. We conducted a combined analysis using Mendelian randomization (MR), transcriptome-wide association studies, Bayesian colocalization, and summary-data-based MR to identify potential therapeutic targets associated with ASD and examine whether there are shared causal variants among them. Furthermore, pathway and drug enrichment analyses were performed to further explore the underlying mechanisms and summarize the current status of pharmacological targets for developing drugs to treat ASD. The protein-protein interaction (PPI) network and mouse knockout models were performed to estimate the effect of therapeutic targets. A total of 17 genes revealed causal associations with ASD and were identified as potential targets for ASD patients. Cathepsin B (CTSB) [odd ratio (OR) = 2.66 95, confidence interval (CI): 1.28-5.52, P = 8.84 × 10-3], gamma-aminobutyric acid type B receptor subunit 1 (GABBR1) (OR = 1.99, 95CI: 1.06-3.75, P = 3.24 × 10-2), and formin like 1 (FMNL1) (OR = 0.15, 95CI: 0.04-0.58, P = 5.59 × 10-3) were replicated in the proteome-wide MR analyses. In Drugbank, two potential therapeutic drugs, Acamprosate (GABBR1 inhibitor) and Bryostatin 1 (CASP8 inhibitor), were inferred as potential influencers of autism. Knockout mouse models suggested the involvement of the CASP8, GABBR1, and PLEKHM1 genes in neurological processes. Our findings suggest 17 candidate therapeutic targets for ASD and provide novel drug targets for therapy development and critical drug repurposing opportunities.


Asunto(s)
Trastorno del Espectro Autista , Estudio de Asociación del Genoma Completo , Proteómica , Humanos , Trastorno del Espectro Autista/tratamiento farmacológico , Trastorno del Espectro Autista/genética , Trastorno del Espectro Autista/metabolismo , Animales , Ratones , Transcriptoma , Sitios de Carácter Cuantitativo , Mapas de Interacción de Proteínas/efectos de los fármacos , Ratones Noqueados , Terapia Molecular Dirigida
6.
Insights Imaging ; 15(1): 161, 2024 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-38913225

RESUMEN

OBJECTIVES: The clinical decision-making regarding choosing surgery alone (SA) or surgery followed by postoperative adjuvant chemotherapy (SPOCT) in esophageal squamous cell carcinoma (ESCC) remains controversial. We aim to propose a pre-therapy PET/CT image-based deep learning approach to improve the survival benefit and clinical management of ESCC patients. METHODS: This retrospective multicenter study included 837 ESCC patients from three institutions. Prognostic biomarkers integrating six networks were developed to build an ESCC prognosis (ESCCPro) model and predict the survival probability of ESCC patients treated with SA and SPOCT. Patients who did not undergo surgical resection were in a control group. Overall survival (OS) was the primary end-point event. The expected improvement in survival prognosis with the application of ESCCPro to assign treatment protocols was estimated by comparing the survival of patients in each subgroup. Seven clinicians with varying experience evaluated how ESCCPro performed in assisting clinical decision-making. RESULTS: In this retrospective multicenter study, patients receiving SA had a median OS 9.2 months longer than controls. No significant differences in survival were found between SA patients with predicted poor outcomes and the controls (p > 0.05). It was estimated that if ESCCPro was used to determine SA and SPOCT eligibility, the median OS in the ESCCPro-recommended SA group and SPOCT group would have been 15.3 months and 24.9 months longer, respectively. In addition, ESCCPro also significantly improved prognosis accuracy, certainty, and the efficiency of clinical experts. CONCLUSION: ESCCPro assistance improved the survival benefit of ESCC patients and the clinical decision-making among the two treatment approaches. CRITICAL RELEVANCE STATEMENT: The ESCCPro model for treatment decision-making is promising to improve overall survival in ESCC patients undergoing surgical resection and patients undergoing surgery followed by postoperative adjuvant chemotherapy. KEY POINTS: ESCC is associated with a poor prognosis and unclear ideal treatments. ESCCPro predicts the survival of patients with ESCC and the expected benefit from SA. ESCCPro improves clinicians' stratification of patients' prognoses.

7.
Int J Surg ; 2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38781035

RESUMEN

BACKGROUND: Sleep problems are prevalent. However, the impact of sleep patterns on digestive diseases remains uncertain. Moreover, the interaction between sleep patterns and genetic predisposition with digestive diseases has not been comprehensively explored. METHODS: 410,586 participants from UK Biobank with complete sleep information were included in the analysis. Sleep patterns were measured by sleep scores as the primary exposure, based on five healthy sleep behaviors. Individual sleep behaviors were secondary exposures. Genetic risk of the digestive diseases was characterized by polygenic risk score. Primary outcome was incidence of 16 digestive diseases. RESULTS: Healthy sleep scores showed dose-response associations with reduced risks of digestive diseases. Compared to participants scoring 0-1, those scoring 5 showed a 28% reduced risk of any digestive disease, including a 50% decrease in irritable bowel syndrome, 37% in non-alcoholic fatty liver disease, 35% in peptic ulcer, 34% in dyspepsia, 32% in gastroesophageal reflux disease, 28% in constipation, 25% in diverticulosis, 24% in severe liver disease, and 18% in gallbladder disease, whereas no correlation was observed with inflammatory bowel disease and pancreatic disease. Participants with poor sleep and high genetic risk exhibited approximately a 60% increase in the risk of digestive diseases. A healthy sleep pattern is linked to lower digestive disease risk in participants of all genetic risk levels. CONCLUSIONS: In this large population-based cohort, a healthy sleep pattern was associated with reduced risk of digestive diseases, regardless of the genetic susceptibility. Our findings underscore the potential impact of healthy sleep traits in mitigating the risk of digestive diseases.

8.
Eur Heart J Digit Health ; 5(3): 363-370, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38774379

RESUMEN

Aims: Cardiovascular disease (CVD) is a leading cause of mortality, especially in developing countries. This study aimed to develop and validate a CVD risk prediction model, Personalized CARdiovascular DIsease risk Assessment for Chinese (P-CARDIAC), for recurrent cardiovascular events using machine learning technique. Methods and results: Three cohorts of Chinese patients with established CVD were included if they had used any of the public healthcare services provided by the Hong Kong Hospital Authority (HA) since 2004 and categorized by their geographical locations. The 10-year CVD outcome was a composite of diagnostic or procedure codes with specific International Classification of Diseases, Ninth Revision, Clinical Modification. Multivariate imputation with chained equations and XGBoost were applied for the model development. The comparison with Thrombolysis in Myocardial Infarction Risk Score for Secondary Prevention (TRS-2°P) and Secondary Manifestations of ARTerial disease (SMART2) used the validation cohorts with 1000 bootstrap replicates. A total of 48 799, 119 672 and 140 533 patients were included in the derivation and validation cohorts, respectively. A list of 125 risk variables were used to make predictions on CVD risk, of which 8 classes of CVD-related drugs were considered interactive covariates. Model performance in the derivation cohort showed satisfying discrimination and calibration with a C statistic of 0.69. Internal validation showed good discrimination and calibration performance with C statistic over 0.6. The P-CARDIAC also showed better performance than TRS-2°P and SMART2. Conclusion: Compared with other risk scores, the P-CARDIAC enables to identify unique patterns of Chinese patients with established CVD. We anticipate that the P-CARDIAC can be applied in various settings to prevent recurrent CVD events, thus reducing the related healthcare burden.

9.
Microbiome ; 12(1): 84, 2024 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-38725076

RESUMEN

BACKGROUND: Emergence of antibiotic resistance in bacteria is an important threat to global health. Antibiotic resistance genes (ARGs) are some of the key components to define bacterial resistance and their spread in different environments. Identification of ARGs, particularly from high-throughput sequencing data of the specimens, is the state-of-the-art method for comprehensively monitoring their spread and evolution. Current computational methods to identify ARGs mainly rely on alignment-based sequence similarities with known ARGs. Such approaches are limited by choice of reference databases and may potentially miss novel ARGs. The similarity thresholds are usually simple and could not accommodate variations across different gene families and regions. It is also difficult to scale up when sequence data are increasing. RESULTS: In this study, we developed ARGNet, a deep neural network that incorporates an unsupervised learning autoencoder model to identify ARGs and a multiclass classification convolutional neural network to classify ARGs that do not depend on sequence alignment. This approach enables a more efficient discovery of both known and novel ARGs. ARGNet accepts both amino acid and nucleotide sequences of variable lengths, from partial (30-50 aa; 100-150 nt) sequences to full-length protein or genes, allowing its application in both target sequencing and metagenomic sequencing. Our performance evaluation showed that ARGNet outperformed other deep learning models including DeepARG and HMD-ARG in most of the application scenarios especially quasi-negative test and the analysis of prediction consistency with phylogenetic tree. ARGNet has a reduced inference runtime by up to 57% relative to DeepARG. CONCLUSIONS: ARGNet is flexible, efficient, and accurate at predicting a broad range of ARGs from the sequencing data. ARGNet is freely available at https://github.com/id-bioinfo/ARGNet , with an online service provided at https://ARGNet.hku.hk . Video Abstract.


Asunto(s)
Bacterias , Redes Neurales de la Computación , Bacterias/genética , Bacterias/efectos de los fármacos , Bacterias/clasificación , Farmacorresistencia Bacteriana/genética , Antibacterianos/farmacología , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Biología Computacional/métodos , Genes Bacterianos/genética , Farmacorresistencia Microbiana/genética , Humanos , Aprendizaje Profundo
10.
JHEP Rep ; 6(6): 101037, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38721342

RESUMEN

Background & Aims: Inflammatory bowel disease (IBD) is commonly associated with extraintestinal complications, including autoimmune liver disease. The co-occurrence of IBD and primary biliary cholangitis (PBC) has been increasingly observed, but the underlying relationship between these conditions remains unclear. Methods: Using summary statistics from genome-wide association studies (GWAS), we investigated the causal effects between PBC and IBD, including Crohn's disease (CD) and ulcerative colitis (UC). We also analyzed the shared genetic architecture between IBD and PBC using data from GWAS, bulk-tissue RNA sequencing, and single cell RNA sequencing, and explored potential functional genes. Result: There was a strong positive genetic correlation between PBC and IBD (linkage disequilibrium score regression: rg = 0.2249, p = 3.38 × 10-5). Cross-trait analysis yielded 10 shared-risk single nucleotide polymorphisms (SNPs), as well as nine novel SNPs, which were associated with both traits. Using Mendelian randomization, a stable causal effect was established of PBC on IBD. Genetically predicted PBC was found to have a risk effect on IBD (1.105; 95% CI: 1.058-1.15; p = 1.16 × 10-10), but not vice versa. Shared tissue-specific heritability enrichment was identified for PBC and IBD (including CD and UC) in lung, spleen, and whole-blood samples. Furthermore, shared enrichment was observed of specific cell types (T cells, B cells, and natural killer cells) and their subtypes. Nine functional genes were identified based on summary statistics-based Mendelian randomization. Conclusions: This study detected shared genetic architecture between IBD and PBC and demonstrated a stable causal relationship of genetically predicted PBC on the risk of IBD. These findings shed light on the biological basis of comorbidity between IBD and PBC, and have important implications for intervention and treatment targets of these two diseases simultaneously. Impact and Implications: The discovery of novel shared single nucleotide polymorphisms (SNPs) and functional genes provides insights into the common targets between inflammatory bowel disease (IBD) and primary biliary cholangitis (PBC), serving as a basis for new drug development and contributing to the study of disease pathogenesis. Additionally, the established significant causality and genetic correlation underscore the importance of clinical intervention in preventing the comorbidity of IBD and PBC. The enrichment of SNP heritability in specific tissues and cell types reveals the role of immune factors in the potential disease mechanisms shared between IBD and PBC. This stimulates further research on potential interventions and could lead to the development of new targets for immune-based therapies.

11.
J Transl Med ; 22(1): 122, 2024 01 31.
Artículo en Inglés | MEDLINE | ID: mdl-38297333

RESUMEN

BACKGROUND: Emerging evidence suggests that Rho GTPases play a crucial role in tumorigenesis and metastasis, but their involvement in the tumor microenvironment (TME) and prognosis of hepatocellular carcinoma (HCC) is not well understood. METHODS: We aim to develop a tumor prognosis prediction system called the Rho GTPases-related gene score (RGPRG score) using Rho GTPase signaling genes and further bioinformatic analyses. RESULTS: Our work found that HCC patients with a high RGPRG score had significantly worse survival and increased immunosuppressive cell fractions compared to those with a low RGPRG score. Single-cell cohort analysis revealed an immune-active TME in patients with a low RGPRG score, with strengthened communication from T/NK cells to other cells through MIF signaling networks. Targeting these alterations in TME, the patients with high RGPRG score have worse immunotherapeutic outcomes and decreased survival time in the immunotherapy cohort. Moreover, the RGPRG score was found to be correlated with survival in 27 other cancers. In vitro experiments confirmed that knockdown of the key Rho GTPase-signaling biomarker SFN significantly inhibited HCC cell proliferation, invasion, and migration. CONCLUSIONS: This study provides new insight into the TME features and clinical use of Rho GTPase gene pattern at the bulk-seq and single-cell level, which may contribute to guiding personalized treatment and improving clinical outcome in HCC.


Asunto(s)
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/genética , Neoplasias Hepáticas/genética , Carcinogénesis , Línea Celular , Inmunosupresores , Proteínas de Unión al GTP rho , Microambiente Tumoral
12.
Bioinform Adv ; 4(1): vbae006, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38282975

RESUMEN

Summary: Third-generation long-read sequencing is an increasingly utilized technique for profiling human immunodeficiency virus (HIV) quasispecies and detecting drug resistance mutations due to its ability to cover the entire viral genome in individual reads. Recently, the ClusterV tool has demonstrated accurate detection of HIV quasispecies from Nanopore long-read sequencing data. However, the need for scripting skills and a computational environment may act as a barrier for many potential users. To address this issue, we have introduced ClusterV-Web, a user-friendly web-based application that enables easy configuration and execution of ClusterV, both remotely and locally. Our tool provides interactive tables and data visualizations to aid in the interpretation of results. This development is expected to democratize access to long-read sequencing data analysis, enabling a wider range of researchers and clinicians to efficiently profile HIV quasispecies and detect drug resistance mutations. Availability and implementation: ClusterV-Web is freely available and open source, with detailed documentation accessible at http://www.bio8.cs.hku.hk/ClusterVW/. The standalone Docker image and source code are also available at https://github.com/HKU-BAL/ClusterV-Web.

14.
Stem Cell Res Ther ; 14(1): 247, 2023 09 13.
Artículo en Inglés | MEDLINE | ID: mdl-37705079

RESUMEN

AIMS: Dissecting complex interactions among transcription factors (TFs), microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) are central for understanding heart development and function. Although computational approaches and platforms have been described to infer relationships among regulatory factors and genes, current approaches do not adequately account for how highly diverse, interacting regulators that include noncoding RNAs (ncRNAs) control cardiac gene expression dynamics over time. METHODS: To overcome this limitation, we devised an integrated framework, cardiac gene regulatory modeling (CGRM) that integrates LogicTRN and regulatory component analysis bioinformatics modeling platforms to infer complex regulatory mechanisms. We then used CGRM to identify and compare the TF-ncRNA gene regulatory networks that govern early- and late-stage cardiomyocytes (CMs) generated by in vitro differentiation of human pluripotent stem cells (hPSC) and ventricular and atrial CMs isolated during in vivo human cardiac development. RESULTS: Comparisons of in vitro versus in vivo derived CMs revealed conserved regulatory networks among TFs and ncRNAs in early cells that significantly diverged in late staged cells. We report that cardiac genes ("heart targets") expressed in early-stage hPSC-CMs are primarily regulated by MESP1, miR-1, miR-23, lncRNAs NEAT1 and MALAT1, while GATA6, HAND2, miR-200c, NEAT1 and MALAT1 are critical for late hPSC-CMs. The inferred TF-miRNA-lncRNA networks regulating heart development and contraction were similar among early-stage CMs, among individual hPSC-CM datasets and between in vitro and in vivo samples. However, genes related to apoptosis, cell cycle and proliferation, and transmembrane transport showed a high degree of divergence between in vitro and in vivo derived late-stage CMs. Overall, late-, but not early-stage CMs diverged greatly in the expression of "heart target" transcripts and their regulatory mechanisms. CONCLUSIONS: In conclusion, we find that hPSC-CMs are regulated in a cell autonomous manner during early development that diverges significantly as a function of time when compared to in vivo derived CMs. These findings demonstrate the feasibility of using CGRM to reveal dynamic and complex transcriptional and posttranscriptional regulatory interactions that underlie cell directed versus environment-dependent CM development. These results with in vitro versus in vivo derived CMs thus establish this approach for detailed analyses of heart disease and for the analysis of cell regulatory systems in other biomedical fields.


Asunto(s)
MicroARNs , ARN Largo no Codificante , Humanos , ARN Largo no Codificante/genética , Factores de Transcripción/genética , MicroARNs/genética , Miocitos Cardíacos , Ventrículos Cardíacos
15.
BMC Bioinformatics ; 24(1): 308, 2023 Aug 03.
Artículo en Inglés | MEDLINE | ID: mdl-37537536

RESUMEN

BACKGROUND: With the continuous advances in third-generation sequencing technology and the increasing affordability of next-generation sequencing technology, sequencing data from different sequencing technology platforms is becoming more common. While numerous benchmarking studies have been conducted to compare variant-calling performance across different platforms and approaches, little attention has been paid to the potential of leveraging the strengths of different platforms to optimize overall performance, especially integrating Oxford Nanopore and Illumina sequencing data. RESULTS: We investigated the impact of multi-platform data on the performance of variant calling through carefully designed experiments with a deep learning-based variant caller named Clair3-MP (Multi-Platform). Through our research, we not only demonstrated the capability of ONT-Illumina data for improved variant calling, but also identified the optimal scenarios for utilizing ONT-Illumina data. In addition, we revealed that the improvement in variant calling using ONT-Illumina data comes from an improvement in difficult genomic regions, such as the large low-complexity regions and segmental and collapse duplication regions. Moreover, Clair3-MP can incorporate reference genome stratification information to achieve a small but measurable improvement in variant calling. Clair3-MP is accessible as an open-source project at: https://github.com/HKU-BAL/Clair3-MP . CONCLUSIONS: These insights have important implications for researchers and practitioners alike, providing valuable guidance for improving the reliability and efficiency of genomic analysis in diverse applications.


Asunto(s)
Genoma , Genómica , Reproducibilidad de los Resultados , Secuenciación de Nucleótidos de Alto Rendimiento
16.
Clin Chem ; 69(10): 1174-1185, 2023 10 03.
Artículo en Inglés | MEDLINE | ID: mdl-37537871

RESUMEN

BACKGROUND: HIV infections often develop drug resistance mutations (DRMs), which can increase the risk of virological failure. However, it has been difficult to determine if minor mutations occur in the same genome or in different virions using Sanger sequencing and short-read sequencing methods. Oxford Nanopore Technologies (ONT) sequencing may improve antiretroviral resistance profiling by allowing for long-read clustering. METHODS: A new ONT sequencing-based method for profiling DRMs in HIV quasispecies was developed and validated. The method used hierarchical clustering of long amplicons that cover regions associated with different types of antiretroviral drugs. A gradient series of an HIV plasmid and 2 plasma samples was prepared to validate the clustering performance. The ONT results were compared to those obtained with Sanger sequencing and Illumina sequencing in 77 HIV-positive plasma samples to evaluate the diagnostic performance. RESULTS: In the validation study, the abundance of detected quasispecies was concordant with the predicted result with the R2 of > 0.99. During the diagnostic evaluation, 59/77 samples were successfully sequenced for DRMs. Among 18 failed samples, 17 were below the limit of detection of 303.9 copies/µL. Based on the receiver operating characteristic analysis, the ONT workflow achieved an F1 score of 0.96 with a cutoff of 0.4 variant allele frequency. Four cases were found to have quasispecies with DRMs, in which 2 harbored quasispecies with more than one class of DRMs. Treatment modifications were recommended for these cases. CONCLUSIONS: Long-read sequencing coupled with hierarchical clustering could differentiate the quasispecies resistance profiles in HIV-infected samples, providing a clearer picture for medical care.


Asunto(s)
Infecciones por VIH , VIH-1 , Humanos , Infecciones por VIH/tratamiento farmacológico , Cuasiespecies/genética , VIH-1/genética , Antirretrovirales/farmacología , Antirretrovirales/uso terapéutico , Mutación , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis por Conglomerados
17.
Sci Rep ; 13(1): 5237, 2023 03 31.
Artículo en Inglés | MEDLINE | ID: mdl-37002338

RESUMEN

Sensitive detection of Mycobacterium tuberculosis (TB) in small percentages in metagenomic samples is essential for microbial classification and drug resistance prediction. However, traditional methods, such as bacterial culture and microscopy, are time-consuming and sometimes have limited TB detection sensitivity. Oxford nanopore technologies (ONT) MinION sequencing allows rapid and simple sample preparation for sequencing. Its recently developed adaptive sequencing selects reads from targets while allowing real-time base-calling to achieve sequence enrichment or depletion during sequencing. Another common enrichment method is PCR amplification of the target TB genes. In this study, we compared both methods using ONT MinION sequencing for TB detection and variant calling in metagenomic samples using both simulation runs and those with synthetic and patient samples. We found that both methods effectively enrich TB reads from a high percentage of human (95%) and other microbial DNA. Adaptive sequencing with readfish and UNCALLDE achieved a 3.9-fold and 2.2-fold enrichment compared to the control run. We provide a simple automatic analysis framework to support the detection of TB for clinical use, openly available at https://github.com/HKU-BAL/ONT-TB-NF . Depending on the patient's medical condition and sample type, we recommend users evaluate and optimize their workflow for different clinical specimens to improve the detection limit.


Asunto(s)
Mycobacterium tuberculosis , Nanoporos , Humanos , Mycobacterium tuberculosis/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Metagenómica/métodos , Metagenoma , Simulación por Computador , Análisis de Secuencia de ADN
18.
Genome Med ; 15(1): 10, 2023 02 14.
Artículo en Inglés | MEDLINE | ID: mdl-36788602

RESUMEN

BACKGROUND: Very low-coverage (0.1 to 1×) whole genome sequencing (WGS) has become a promising and affordable approach to discover genomic variants of human populations for genome-wide association study (GWAS). To support genetic screening using preimplantation genetic testing (PGT) in a large population, the sequencing coverage goes below 0.1× to an ultra-low level. However, the feasibility and effectiveness of ultra-low-coverage WGS (ulcWGS) for GWAS remains undetermined. METHODS: We built a pipeline to carry out analysis of ulcWGS data for GWAS. To examine its effectiveness, we benchmarked the accuracy of genotype imputation at the combination of different coverages below 0.1× and sample sizes from 2000 to 16,000, using 17,844 embryo PGT samples with approximately 0.04× average coverage and the standard Chinese sample HG005 with known genotypes. We then applied the imputed genotypes of 1744 transferred embryos who have gestational ages and complete follow-up records to GWAS. RESULTS: The accuracy of genotype imputation under ultra-low coverage can be improved by increasing the sample size and applying a set of filters. From 1744 born embryos, we identified 11 genomic risk loci associated with gestational ages and 166 genes mapped to these loci according to positional, expression quantitative trait locus, and chromatin interaction strategies. Among these mapped genes, CRHBP, ICAM1, and OXTR were more frequently reported as preterm birth related. By joint analysis of gene expression data from previous studies, we constructed interrelationships of mainly CRHBP, ICAM1, PLAGL1, DNMT1, CNTLN, DKK1, and EGR2 with preterm birth, infant disease, and breast cancer. CONCLUSIONS: This study not only demonstrates that ulcWGS could achieve relatively high accuracy of adequate genotype imputation and is capable of GWAS, but also provides insights into the associations between gestational age and genetic variations of the fetal embryos from Chinese population.


Asunto(s)
Estudio de Asociación del Genoma Completo , Nacimiento Prematuro , Recién Nacido , Femenino , Humanos , Edad Gestacional , Polimorfismo de Nucleótido Simple , Pruebas Genéticas , Genotipo , Sitios de Carácter Cuantitativo
19.
BMC Bioinformatics ; 23(1): 465, 2022 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-36344913

RESUMEN

BACKGROUND: Whole genome sequencing using the long-read Oxford Nanopore Technologies (ONT) MinION sequencer provides a cost-effective option for structural variant (SV) detection in clinical applications. Despite the advantage of using long reads, however, accurate SV calling and phasing are still challenging. RESULTS: We introduce Duet, an SV detection tool optimized for SV calling and phasing using ONT data. The tool uses novel features integrated from both SV signatures and single-nucleotide polymorphism signatures, which can accurately distinguish SV haplotype from a false signal. Duet was benchmarked against state-of-the-art tools on multiple ONT sequencing datasets of sequencing coverage ranging from 8× to 40×. At low sequencing coverage of 8×, Duet performs better than all other tools in SV calling, SV genotyping and SV phasing. When the sequencing coverage is higher (20× to 40×), the F1-score for SV phasing is further improved in comparison to the performance of other tools, while its performance of SV genotyping and SV calling remains higher than other tools. CONCLUSION: Duet can perform accurate SV calling, SV genotyping and SV phasing using low-coverage ONT data, making it very useful for low-coverage genomes. It has great performance when scaled to high-coverage genomes, which is adaptable to various clinical applications. Duet is open source and is available at https://github.com/yekaizhou/duet .


Asunto(s)
Secuenciación de Nanoporos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Secuenciación Completa del Genoma
20.
DNA Res ; 29(6)2022 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-36308393

RESUMEN

DNA sequences that are absent in the human reference genome are classified as novel sequences. The discovery of these missed sequences is crucial for exploring the genomic diversity of populations and understanding the genetic basis of human diseases. However, various DNA lengths of reads generated from different sequencing technologies can significantly affect the results of novel sequences. In this work, we designed an assembly-free novel sequence (AF-NS) approach to identify novel sequences from Oxford Nanopore Technology long reads. Among the newly detected sequences using AF-NS, more than 95% were omitted from those using long-read assemblers and 85% were not present in short reads of Illumina. We identified the common novel sequences among all the samples and revealed their association with the binding motifs of transcription factors. Regarding the placements of the novel sequences, we found about 70% enriched in repeat regions and generated 430 for one specific subpopulation that might be related to their evolution. Our study demonstrates the advance of the assembly-free approach to capture more novel sequences over other assembler based methods. Combining the long-read data with powerful analytical methods can be a robust way to improve the completeness of novel sequences.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Nanoporos , Humanos , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Genómica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA