Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
BMC Bioinformatics ; 25(1): 193, 2024 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-38755527

RESUMEN

We have developed AMRViz, a toolkit for analyzing, visualizing, and managing bacterial genomics samples. The toolkit is bundled with the current best practice analysis pipeline allowing researchers to perform comprehensive analysis of a collection of samples directly from raw sequencing data with a single command line. The analysis results in a report showing the genome structure, genome annotations, antibiotic resistance and virulence profile for each sample. The pan-genome of all samples of the collection is analyzed to identify core- and accessory-genes. Phylogenies of the whole genome as well as all gene clusters are also generated. The toolkit provides a web-based visualization dashboard allowing researchers to interactively examine various aspects of the analysis results. Availability: AMRViz is implemented in Python and NodeJS, and is publicly available under open source MIT license at https://github.com/amromics/amrviz .


Asunto(s)
Genoma Bacteriano , Genómica , Programas Informáticos , Genómica/métodos , Farmacorresistencia Bacteriana/genética , Filogenia , Bacterias/genética , Bacterias/efectos de los fármacos , Antibacterianos/farmacología
2.
Heliyon ; 10(6): e27043, 2024 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-38509882

RESUMEN

Despite the raised awareness of the role of pharmacogenomic (PGx) in personalized medicines for COVID-19, data for COVID-19 drugs is extremely scarce and not even a publication on this topic for post-COVID-19 medications to date. In the current study, we investigated the genetic variations associated with COVID-19 and post-COVID-19 therapies by using whole genome sequencing data of the 1000 Vietnamese Genomes Project (1KVG) in comparison with other populations retrieved from the 1000 Genomes Project Phase 3 (1KGP3) and the Genome Aggregation Database (gnomAD). Moreover, we also evaluated the risk of drug interactions in comorbid COVID-19 and post-COVID-19 patients based on pharmacogenomic profiles of drugs using a computational approach. For COVID-19 therapies, variants related to the response of two causal treatment agents (tolicizumab and ritonavir) and antithrombotic drugs are common in the Vietnamese cohort. Regarding post-COVID-19, drugs for mental manipulations possess the highest number of clinical annotated variants carried by Vietnamese individuals. Among the superpopulations, East Asian populations shared the most similar genetic structure with the Vietnamese population, whereas the African population showed the most difference. Comorbid patients are at an increased drug-drug interaction (DDI) risk when suffering from COVID-19 and after recovering as well due to a large number of potential DDIs which have been identified. Our results presented the population-specific understanding of the pharmacogenomic aspect of COVID-19 and post-COVID-19 therapy to optimize therapeutic outcomes and promote personalized medicine strategy. We also partly clarified the higher risk in COVID-19 patients with underlying conditions by assessing the potential drug interactions.

3.
BMC Genomics ; 25(1): 52, 2024 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-38212682

RESUMEN

BACKGROUND: Most skin-related traits have been studied in Caucasian genetic backgrounds. A comprehensive study on skin-associated genetic effects on underrepresented populations such as Vietnam is needed to fill the gaps in the field. OBJECTIVES: We aimed to develop a computational pipeline to predict the effect of genetic factors on skin traits using public data (GWAS catalogs and whole-genome sequencing (WGS) data from the 1000 Genomes Project-1KGP) and in-house Vietnamese data (WGS and genotyping by SNP array). Also, we compared the genetic predispositions of 25 skin-related traits of Vietnamese population to others to acquire population-specific insights regarding skin health. METHODS: Vietnamese cohorts of whole-genome sequencing (WGS) of 1008 healthy individuals for the reference and 96 genotyping samples (which do not have any skin cutaneous issues) by Infinium Asian Screening Array-24 v1.0 BeadChip were employed to predict skin-associated genetic variants of 25 skin-related and micronutrient requirement traits in population analysis and correlation analysis. Simultaneously, we compared the landscape of cutaneous issues of Vietnamese people with other populations by assessing their genetic profiles. RESULTS: The skin-related genetic profile of Vietnamese cohorts was similar at most to East Asian cohorts (JPT: Fst = 0.036, CHB: Fst = 0.031, CHS: Fst = 0.027, CDX: Fst = 0.025) in the population study. In addition, we identified pairs of skin traits at high risk of frequent co-occurrence (such as skin aging and wrinkles (r = 0.45, p = 1.50e-5) or collagen degradation and moisturizing (r = 0.35, p = 1.1e-3)). CONCLUSION: This is the first investigation in Vietnam to explore genetic variants of facial skin. These findings could improve inadequate skin-related genetic diversity in the currently published database.


Asunto(s)
Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple , Piel , Pueblos del Sudeste Asiático , Humanos , Estudio de Asociación del Genoma Completo , Fenotipo , Vietnam
4.
Nucleic Acids Res ; 52(3): e15, 2024 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-38084888

RESUMEN

Whole genome sequencing has increasingly become the essential method for studying the genetic mechanisms of antimicrobial resistance and for surveillance of drug-resistant bacterial pathogens. The majority of bacterial genomes sequenced to date have been sequenced with Illumina sequencing technology, owing to its high-throughput, excellent sequence accuracy, and low cost. However, because of the short-read nature of the technology, these assemblies are fragmented into large numbers of contigs, hindering the obtaining of full information of the genome. We develop Pasa, a graph-based algorithm that utilizes the pangenome graph and the assembly graph information to improve scaffolding quality. By leveraging the population information of the bacteria species, Pasa is able to utilize the linkage information of the gene families of the species to resolve the contig graph of the assembly. We show that our method outperforms the current state of the arts in terms of accuracy, and at the same time, is computationally efficient to be applied to a large number of existing draft assemblies.


Asunto(s)
Algoritmos , Bacterias , Genoma Bacteriano , Bacterias/clasificación , Bacterias/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos
5.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36326078

RESUMEN

Most polygenic risk score (PRS)models have been based on data from populations of European origins (accounting for the majority of the large genomics datasets, e.g. >78% in the UK Biobank and >85% in the GTEx project). Although several large-scale Asian biobanks were initiated (e.g. Japanese, Korean, Han Chinese biobanks), most other Asian countries have little or near-zero genomics data. To implement PRS models for under-represented populations, we explored transfer learning approaches, assuming that information from existing large datasets can compensate for the small sample size that can be feasibly obtained in developing countries, like Vietnam. Here, we benchmark 13 common PRS methods in meta-population strategy (combining individual genotype data from multiple populations) and multi-population strategy (combining summary statistics from multiple populations). Our results highlight the complementarity of different populations and the choice of methods should depend on the target population. Based on these results, we discussed a set of guidelines to help users select the best method for their datasets. We developed a robust and comprehensive software to allow for benchmarking comparisons between methods and proposed a computational framework for improving PRS performance in a dataset with a small sample size. This work is expected to inform the development of genomics applications in under-represented populations. PRSUP framework is available at: https://github.com/BiomedicalMachineLearning/VGP.


Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple , Vietnam , Genómica/métodos , Factores de Riesgo
6.
Sci Rep ; 12(1): 17556, 2022 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-36266455

RESUMEN

Regardless of the overwhelming use of next-generation sequencing technologies, microarray-based genotyping combined with the imputation of untyped variants remains a cost-effective means to interrogate genetic variations across the human genome. This technology is widely used in genome-wide association studies (GWAS) at bio-bank scales, and more recently, in polygenic score (PGS) analysis to predict and stratify disease risk. Over the last decade, human genotyping arrays have undergone a tremendous growth in both number and content making a comprehensive evaluation of their performances became more important. Here, we performed a comprehensive performance assessment for 23 available human genotyping arrays in 6 ancestry groups using diverse public and in-house datasets. The analyses focus on performance estimation of derived imputation (in terms of accuracy and coverage) and PGS (in terms of concordance to PGS estimated from whole-genome sequencing data) in three different traits and diseases. We found that the arrays with a higher number of SNPs are not necessarily the ones with higher imputation performance, but the arrays that are well-optimized for the targeted population could provide very good imputation performance. In addition, PGS estimated by imputed SNP array data is highly correlated to PGS estimated by whole-genome sequencing data in most cases. When optimal arrays are used, the correlations of PGS between two types of data are higher than 0.97, but interestingly, arrays with high density can result in lower PGS performance. Our results suggest the importance of properly selecting a suitable genotyping array for PGS applications. Finally, we developed a web tool that provides interactive analyses of tag SNP contents and imputation performance based on population and genomic regions of interest. This study would act as a practical guide for researchers to design their genotyping arrays-based studies. The tool is available at: https://genome.vinbigdata.org/tools/saa/ .


Asunto(s)
Genoma Humano , Estudio de Asociación del Genoma Completo , Humanos , Genotipo , Polimorfismo de Nucleótido Simple , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
7.
Brief Bioinform ; 23(4)2022 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-35780383

RESUMEN

Despite the rapid development of sequencing technology, single-nucleotide polymorphism (SNP) arrays are still the most cost-effective genotyping solutions for large-scale genomic research and applications. Recent years have witnessed the rapid development of numerous genotyping platforms of different sizes and designs, but population-specific platforms are still lacking, especially for those in developing countries. SNP arrays designed for these countries should be cost-effective (small size), yet incorporate key information needed to associate genotypes with traits. A key design principle for most current platforms is to improve genome-wide imputation so that more SNPs not included in the array (imputed SNPs) can be predicted. However, current tag SNP selection methods mostly focus on imputation accuracy and coverage, but not the functional content of the array. It is those functional SNPs that are most likely associated with traits. Here, we propose LmTag, a novel method for tag SNP selection that not only improves imputation performance but also prioritizes highly functional SNP markers. We apply LmTag on a wide range of populations using both public and in-house whole-genome sequencing databases. Our results show that LmTag improved both functional marker prioritization and genome-wide imputation accuracy compared to existing methods. This novel approach could contribute to the next generation genotyping arrays that provide excellent imputation capability as well as facilitate array-based functional genetic studies. Such arrays are particularly suitable for under-represented populations in developing countries or non-model species, where little genomics data are available while investment in genome sequencing or high-density SNP arrays is limited. $\textrm{LmTag}$ is available at: https://github.com/datngu/LmTag.


Asunto(s)
Genómica , Polimorfismo de Nucleótido Simple , Mapeo Cromosómico , Genotipo , Fenotipo
8.
BMC Infect Dis ; 22(1): 558, 2022 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-35718768

RESUMEN

BACKGROUND: A global pandemic has been declared for coronavirus disease 2019 (COVID-19), which has serious impacts on human health and healthcare systems in the affected areas, including Vietnam. None of the previous studies have a framework to provide summary statistics of the virus variants and assess the severity associated with virus proteins and host cells in COVID-19 patients in Vietnam. METHOD: In this paper, we comprehensively investigated SARS-CoV-2 variants and immune responses in COVID-19 patients. We provided summary statistics of target sequences of SARS-CoV-2 in Vietnam and other countries for data scientists to use in downstream analysis for therapeutic targets. For host cells, we proposed a predictive model of the severity of COVID-19 based on public datasets of hospitalization status in Vietnam, incorporating a polygenic risk score. This score uses immunogenic SNP biomarkers as indicators of COVID-19 severity. RESULT: We identified that the Delta variant of SARS-CoV-2 is most prevalent in southern areas of Vietnam and it is different from other areas in the world using various data sources. Our predictive models of COVID-19 severity had high accuracy (Random Forest AUC = 0.81, Elastic Net AUC = 0.7, and SVM AUC = 0.69) and showed that the use of polygenic risk scores increased the models' predictive capabilities. CONCLUSION: We provided a comprehensive analysis for COVID-19 severity in Vietnam. This investigation is not only helpful for COVID-19 treatment in therapeutic target studies, but also could influence further research on the disease progression and personalized clinical outcomes.


Asunto(s)
Tratamiento Farmacológico de COVID-19 , COVID-19 , Infecciones por Coronavirus , Neumonía Viral , Betacoronavirus , COVID-19/epidemiología , Estudio de Asociación del Genoma Completo , Humanos , SARS-CoV-2/genética , Vietnam/epidemiología
9.
Genes (Basel) ; 13(2)2022 01 29.
Artículo en Inglés | MEDLINE | ID: mdl-35205313

RESUMEN

(1) Background: Individuals with BRCA1/2 gene mutations are at increased risk of breast and ovarian cancer. The prevalence of BRCA1/2 mutations varies by race and ethnicity, and the prevalence and the risks associated with most BRCA1/2 mutations has not been unknown in the Vietnamese population. We herein screen the entire BRCA1 and BRCA2 genes for breast and ovarian cancer patients with a family history of breast cancer and ovarian cancer, thereby, suggesting a risk score associated with carrier status and history for aiding personalized treatment; (2) Methods: Between December 2017 and December 2019, Vietnamese patients who had a pathological diagnosis of breast and epithelial ovarian cancer were followed up, prospectively, after treatment from two large institutions in Vietnam. Blood samples from 33 Vietnamese patients with hereditary breast and ovarian cancers (HBOC) syndrome were collected and analyzed using Next Generation Sequencing; (3) Results: Eleven types of mutations in both BRCA1 (in nine patients) and BRCA2 (in three patients) were detected, two of which (BRCA1:p.Tyr1666Ter and BRCA2:p.Ser1341Ter) have not been previously documented in the literature. Seven out of 19 patient's relatives had BRCA1/2 gene mutations. All selected patients were counselled about the likelihood of cancer rising and prophylactic screening and procedures. The study established a risk score associated with the cohorts based on carrier status and family history; (4) Conclusions: Our findings suggested the implications for the planning of a screening programme for BRCA1 and BRCA2 genes testing in breast and ovarian cancer patients and genetic screening in their relatives. BRCA1/2 mutation carriers without cancer should have early and regular cancer screening, and prophylactic measures. This study could be beneficial for a diverse group in a large population-specific cohort, related to HBOC Syndrome.


Asunto(s)
Síndrome de Cáncer de Mama y Ovario Hereditario , Neoplasias Ováricas , Proteína BRCA1/genética , Femenino , Predisposición Genética a la Enfermedad , Síndrome de Cáncer de Mama y Ovario Hereditario/epidemiología , Síndrome de Cáncer de Mama y Ovario Hereditario/genética , Humanos , Mutación , Neoplasias Ováricas/epidemiología , Neoplasias Ováricas/genética , Neoplasias Ováricas/patología , Vietnam/epidemiología
10.
Pharmgenomics Pers Med ; 14: 61-75, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33469342

RESUMEN

Pharmacogenomics has been used effectively in studying adverse drug reactions by determining the person-specific genetic factors associated with individual response to a drug. Current approaches have revealed the significant importance of sequencing technologies and sequence analysis strategies for interpreting the contribution of genetic variation in developing adverse reactions. Advance in next generation sequencing and platform brings new opportunities in validating the genetic candidates in certain reactions, and could be used to develop the preemptive tests to predict the outcome of the variation in a personal response to a drug. With the highly accumulated available data recently, the in silico approach with data analysis and modeling plays as other important alternatives which significantly support the final decisions in the transformation from research to clinical applications such as diagnosis and treatments for various types of adverse responses.

11.
Bioinformatics ; 34(17): 2918-2926, 2018 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-29590294

RESUMEN

Motivation: The detection of genomic variants has great significance in genomics, bioinformatics, biomedical research and its applications. However, despite a lot of effort, Indels and structural variants are still under-characterized compared to SNPs. Current approaches based on next-generation sequencing data usually require large numbers of reads (high coverage) to be able to detect such types of variants accurately. However Indels, especially those close to each other, are still hard to detect accurately. Results: We introduce a novel approach that leverages known variant information, e.g. provided by dbSNP, dbVar, ExAC or the 1000 Genomes Project, to improve sensitivity of detecting variants, especially close-by Indels. In our approach, the standard reference genome and the known variants are combined to build a meta-reference, which is expected to be probabilistically closer to the subject genomes than the standard reference. An alignment algorithm, which can take into account known variant information, is developed to accurately align reads to the meta-reference. This strategy resulted in accurate alignment and variant calling even with low coverage data. We showed that compared to popular methods such as GATK and SAMtools, our method significantly improves the sensitivity of detecting variants, especially Indels that are close to each other. In particular, our method was able to call these close-by Indels at a 15-20% higher sensitivity than other methods at low coverage, and still get 1-5% higher sensitivity at high coverage, at competitive precision. These results were validated using simulated data with variant profiles extracted from the 1000 Genomes Project data, and real data from the Illumina Platinum Genomes Project and ExAC database. Our finding suggests that by incorporating known variant information in an appropriate manner, sensitive variant calling is possible at a low cost. Availability and implementation: Implementation can be found in our public code repository https://github.com/namsyvo/IVC. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Mutación INDEL , Algoritmos , Genoma Humano , Genómica/métodos , Humanos
12.
BMC Bioinformatics ; 16 Suppl 17: S3, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26678826

RESUMEN

BACKGROUND: Although it is frequently observed that aligning short reads to genomes becomes harder if they contain complex repeat patterns, there has not been much effort to quantify the relationship between complexity of genomes and difficulty of short-read alignment. Existing measures of sequence complexity seem unsuitable for the understanding and quantification of this relationship. RESULTS: We investigated several measures of complexity and found that length-sensitive measures of complexity had the highest correlation to accuracy of alignment. In particular, the rate of distinct substrings of length k, where k is similar to the read length, correlated very highly to alignment performance in terms of precision and recall. We showed how to compute this measure efficiently in linear time, making it useful in practice to estimate quickly the difficulty of alignment for new genomes without having to align reads to them first. We showed how the length-sensitive measures could provide additional information for choosing aligners that would align consistently accurately on new genomes. CONCLUSIONS: We formally established a connection between genome complexity and the accuracy of short-read aligners. The relationship between genome complexity and alignment accuracy provides additional useful information for selecting suitable aligners for new genomes. Further, this work suggests that the complexity of genomes sometimes should be thought of in terms of specific computational problems, such as the alignment of short reads to genomes.


Asunto(s)
Genoma , Alineación de Secuencia/métodos , Animales , Secuencia de Bases , Humanos , Análisis de Secuencia de ADN , Programas Informáticos , Factores de Tiempo
13.
BMC Bioinformatics ; 15 Suppl 11: S2, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25350806

RESUMEN

BACKGROUND: The analysis of gene expression has played an important role in medical and bioinformatics research. Although it is known that a large number of samples is needed to determine the patterns of gene expression accurately, practical designs of gene expression studies occasionally have insufficient numbers of samples, making it difficult to ascertain true response patterns of variantly expressed genes. RESULTS: We describe an approach to cope with the challenge of predicting true orders of gene response to treatments. We show that true patterns of gene response must be orderable sets. In experiments with few samples, we modify the conventional pairwise comparison tests and increase the significance level α intelligently to deduce orderable patterns, which are most likely true orders of gene response. Additionally, motivated by the fact that a gene can be involved in multiple biological functions, our method further resamples experimental replicates and predicts multiple response patterns for each gene. CONCLUSIONS: This method can be useful in designing cost-effective experiments with small sample sizes. Patterns of highly-variantly expressed genes can be predicted by varying α intelligently. Furthermore, clusters are labeled meaningfully with patterns that describe precisely how genes in such clusters respond to treatments.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Animales , Análisis por Conglomerados , Redes Reguladoras de Genes , Ratas Sprague-Dawley , Tamaño de la Muestra , Factores de Transcripción/metabolismo
14.
BMC Genomics ; 15 Suppl 5: S2, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25081493

RESUMEN

BACKGROUND: The alignment of short reads generated by next-generation sequencers to genomes is an important problem in many biomedical and bioinformatics applications. Although many proposed methods work very well on narrow ranges of read lengths, they tend to suffer in performance and alignment quality for reads outside of these ranges. RESULTS: We introduce RandAL, a novel method that aligns DNA sequences to reference genomes. Our approach utilizes two FM indices to facilitate efficient bidirectional searching, a pruning heuristic to speed up the computing of edit distances, and most importantly, a randomized strategy that enables effective estimation of key parameters. Extensive comparisons showed that RandAL outperformed popular aligners in most instances and was unique in its consistent and accurate performance over a wide range of read lengths and error rates. The software package is publicly available at https://github.com/namsyvo/RandAL. CONCLUSIONS: RandAL promises to align effectively and accurately short reads that come from a variety of technologies with different read lengths and rates of sequencing error.


Asunto(s)
Algoritmos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Biología Computacional , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...