Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
Bioinformatics ; 38(10): 2963-2964, 2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35561190

RESUMEN

SUMMARY: We developed BIODICA, an integrated computational environment for application of independent component analysis (ICA) to bulk and single-cell molecular profiles, interpretation of the results in terms of biological functions and correlation with metadata. The computational core is the novel Python package stabilized-ica which provides interface to several ICA algorithms, a stabilization procedure, meta-analysis and component interpretation tools. BIODICA is equipped with a user-friendly graphical user interface, allowing non-experienced users to perform the ICA-based omics data analysis. The results are provided in interactive ways, thus facilitating communication with biology experts. AVAILABILITY AND IMPLEMENTATION: BIODICA is implemented in Java, Python and JavaScript. The source code is freely available on GitHub under the MIT and the GNU LGPL licenses. BIODICA is supported on all major operating systems. URL: https://sysbio-curie.github.io/biodica-environment/.


Asunto(s)
Algoritmos , Programas Informáticos , Biología Computacional/métodos , Metadatos
2.
Int J Mol Sci ; 20(18)2019 Sep 07.
Artículo en Inglés | MEDLINE | ID: mdl-31500324

RESUMEN

Independent component analysis (ICA) is a matrix factorization approach where the signals captured by each individual matrix factors are optimized to become as mutually independent as possible. Initially suggested for solving source blind separation problems in various fields, ICA was shown to be successful in analyzing functional magnetic resonance imaging (fMRI) and other types of biomedical data. In the last twenty years, ICA became a part of the standard machine learning toolbox, together with other matrix factorization methods such as principal component analysis (PCA) and non-negative matrix factorization (NMF). Here, we review a number of recent works where ICA was shown to be a useful tool for unraveling the complexity of cancer biology from the analysis of different types of omics data, mainly collected for tumoral samples. Such works highlight the use of ICA in dimensionality reduction, deconvolution, data pre-processing, meta-analysis, and others applied to different data types (transcriptome, methylome, proteome, single-cell data). We particularly focus on the technical aspects of ICA application in omics studies such as using different protocols, determining the optimal number of components, assessing and improving reproducibility of the ICA results, and comparison with other popular matrix factorization techniques. We discuss the emerging ICA applications to the integrative analysis of multi-level omics datasets and introduce a conceptual view on ICA as a tool for defining functional subsystems of a complex biological system and their interactions under various conditions. Our review is accompanied by a Jupyter notebook which illustrates the discussed concepts and provides a practical tool for applying ICA to the analysis of cancer omics datasets.


Asunto(s)
Biología Computacional/métodos , Neoplasias/genética , Neoplasias/metabolismo , Algoritmos , Curaduría de Datos , Bases de Datos Factuales , Humanos , Aprendizaje Automático , Imagen por Resonancia Magnética , Neoplasias/diagnóstico por imagen , Análisis de Componente Principal
3.
Electromagn Biol Med ; 38(1): 21-31, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30409044

RESUMEN

The correlation between shape and concentration of silver nanoparticles (AgNPs), their cytotoxicity and formation of reactive oxygen species (ROS) in the presence of electromagnetic fields (EMFs) has been investigated. In addition, the bio-effects caused by the combination of EMFs and graphene nanoparticles (GrNPs) have been also assessed. The AgNPs of three shapes (triangular, spherical and colloidal) and GrNPs were added in high concentrations to the culture of human fibroblasts and exposed to EMF of three different frequencies: 900, 2400 and 7500 MHz. The results demonstrated the dependence of the EMF-induced cytotoxicity on the shape and concentration of AgNPs. The maximal cell killing effect was observed at 900 MHz frequency for NPs of all shapes and concentrations. The highest temperature elevation was observed for GrNPs solution irradiated by EMF of 900 MHz frequency. The exposure to EMF led to significant increase of ROS formation in triangular and colloidal AgNPs solutions. However, no impact of EMF on ROS production was detected for spherical AgNPs. GrNPs demonstrated ROS-protective activity that was dependent on their concentration. Our findings indicate the feasibility to control cytotoxicity of AgNPs by means of EMFs. The effect EMF on the biological activity of AgNPs and GrNPs is reported here for the first time.


Asunto(s)
Campos Electromagnéticos , Grafito/química , Grafito/toxicidad , Nanopartículas del Metal/toxicidad , Plata/química , Plata/toxicidad , Fibroblastos/citología , Fibroblastos/efectos de los fármacos , Fibroblastos/efectos de la radiación , Humanos , Especies Reactivas de Oxígeno/metabolismo , Temperatura
4.
BMC Genomics ; 18(1): 712, 2017 Sep 11.
Artículo en Inglés | MEDLINE | ID: mdl-28893186

RESUMEN

BACKGROUND: Independent Component Analysis (ICA) is a method that models gene expression data as an action of a set of statistically independent hidden factors. The output of ICA depends on a fundamental parameter: the number of components (factors) to compute. The optimal choice of this parameter, related to determining the effective data dimension, remains an open question in the application of blind source separation techniques to transcriptomic data. RESULTS: Here we address the question of optimizing the number of statistically independent components in the analysis of transcriptomic data for reproducibility of the components in multiple runs of ICA (within the same or within varying effective dimensions) and in multiple independent datasets. To this end, we introduce ranking of independent components based on their stability in multiple ICA computation runs and define a distinguished number of components (Most Stable Transcriptome Dimension, MSTD) corresponding to the point of the qualitative change of the stability profile. Based on a large body of data, we demonstrate that a sufficient number of dimensions is required for biological interpretability of the ICA decomposition and that the most stable components with ranks below MSTD have more chances to be reproduced in independent studies compared to the less stable ones. At the same time, we show that a transcriptomics dataset can be reduced to a relatively high number of dimensions without losing the interpretability of ICA, even though higher dimensions give rise to components driven by small gene sets. CONCLUSIONS: We suggest a protocol of ICA application to transcriptomics data with a possibility of prioritizing components with respect to their reproducibility that strengthens the biological interpretation. Computing too few components (much less than MSTD) is not optimal for interpretability of the results. The components ranked within MSTD range have more chances to be reproduced in independent studies.


Asunto(s)
Perfilación de la Expresión Génica , Neoplasias/genética , Reproducibilidad de los Resultados , Estadística como Asunto
5.
Med Sci Monit ; 22: 5049-5057, 2016 Dec 22.
Artículo en Inglés | MEDLINE | ID: mdl-28003640

RESUMEN

BACKGROUND We scrutinized the feasibility of apoptosis induction in blood cancer cells by means of low-intensity ultrasound and the proteasome inhibitor bortezomib (Velcade). MATERIAL AND METHODS Human leukemic monocyte lymphoma U937 cells were subjected to ultrasound in the presence of bortezomib and the echo contrast agent Sonazoid. Two types of acoustic intensity (0.18 W/cm² and 0.05 W/cm²) were used for the experiments. Treated U937 cells were analyzed for viability and levels of early and late apoptosis. In addition, scanning electron microscopy analysis of treated cells was performed. RESULTS The percentage of cells that underwent early apoptosis in the group treated with ultrasound and Sonazoid was 8.0±1.31% (intensity 0.18 W/cm²) and 7.0±1.69% (0.05 W/cm²). However, coupling of bortezomib and Sonazoid resulted in an increase in the percentage of cells in the early apoptosis phase, up to 32.50±3.59% (intensity 0.18 W/cm²) and 33.0±4.90% (0.05 W/cm²). The percentage of U937 cells in the late apoptosis stage was not significantly different from that in the group treated with bortezomib only. CONCLUSIONS Our findings indicate the feasibility of apoptosis induction in blood cancer cells by using a combination of bortezomib, ultrasound contrast agents, and low-intensity ultrasound.


Asunto(s)
Apoptosis/efectos de los fármacos , Bortezomib/farmacología , Ultrasonido , Supervivencia Celular/efectos de los fármacos , Humanos , Microscopía Electrónica de Rastreo , Células U937
6.
Front Genet ; 15: 1249751, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38562378

RESUMEN

Esophageal squamous cell carcinoma (ESCC) is the predominant subtype of esophageal cancer in Central Asia, often diagnosed at advanced stages. Understanding population-specific patterns of ESCC is crucial for tailored treatments. This study aimed to unravel ESCC's genetic basis in Kazakhstani patients and identify potential biomarkers for early diagnosis and targeted therapies. ESCC patients from Kazakhstan were studied. We analyzed histological subtypes and conducted in-depth transcriptome sequencing. Differential gene expression analysis was performed, and significantly dysregulated pathways were identified using KEGG pathway analysis (p-value < 0.05). Protein-protein interaction networks were constructed to elucidate key modules and their functions. Among Kazakhstani patients, ESCC with moderate dysplasia was the most prevalent subtype. We identified 42 significantly upregulated and two significantly downregulated KEGG pathways, highlighting molecular mechanisms driving ESCC pathogenesis. Immune-related pathways, such as viral protein interaction with cytokines, rheumatoid arthritis, and oxidative phosphorylation, were elevated, suggesting immune system involvement. Conversely, downregulated pathways were associated with extracellular matrix degradation, crucial in cancer invasion and metastasis. Protein-protein interaction network analysis revealed four distinct modules with specific functions, implicating pathways in esophageal cancer development. High-throughput transcriptome sequencing elucidated critical molecular pathways underlying esophageal carcinogenesis in Kazakhstani patients. Insights into dysregulated pathways offer potential for early diagnosis and precision treatment strategies for ESCC. Understanding population-specific patterns is essential for personalized approaches to ESCC management.

7.
Front Genet ; 13: 902804, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35899193

RESUMEN

Kazakhstan, the ninth-largest country in the world, is located along the Great Silk Road and connects Europe with Asia. Historically, its territory has been inhabited by nomadic tribes, and modern-day Kazakhstan is a multiethnic country with a dominant Kazakh population. We sequenced and analyzed the genomes of five ethnic Kazakhs at high coverage using the Illumina HiSeq2000 next-generation sequencing platform. The five Kazakhs yielded a total number of base pairs ranging from 87,308,581,400 to 107,526,741,301. On average, 99.06% were properly mapped. Based on the Het/Hom and Ti/Tv ratios, the quality of the genomic data ranged from 1.35 to 1.49 and from 2.07 to 2.08, respectively. Genetic variants were identified and annotated. Functional analysis of the genetic variants identified several variants that were associated with higher risks of metabolic and neurogenerative diseases. The present study showed high levels of genetic admixture of Kazakhs that were comparable to those of other Central Asians. These whole-genome sequence data of healthy Kazakhs could contribute significantly to biomedical studies of common diseases as their findings could allow better insight into the genotype-phenotype relations at the population level.

8.
Front Genet ; 13: 906318, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36118859

RESUMEN

Severe acute respiratory syndrome (SARS-CoV-2) is responsible for the worldwide pandemic, COVID-19. The original viral whole-genome was sequenced by a high-throughput sequencing approach from the samples obtained from Wuhan, China. Real-time gene sequencing is the main parameter to manage viral outbreaks because it expands our understanding of virus proliferation, spread, and evolution. Whole-genome sequencing is critical for SARS-CoV-2 variant surveillance, the development of new vaccines and boosters, and the representation of epidemiological situations in the country. A significant increase in the number of COVID-19 cases confirmed in August 2021 in Kazakhstan facilitated a need to establish an effective and proficient system for further study of SARS-CoV-2 genetic variants and the development of future Kazakhstan's genomic surveillance program. The SARS-CoV-2 whole-genome was sequenced according to SARS-CoV-2 ARTIC protocol (EXP-MRT001) by Oxford Nanopore Technologies at the National Laboratory Astana, Kazakhstan to track viral variants circulating in the country. The 500 samples kindly provided by the Republican Diagnostic Center (UMC-NU) and private laboratory KDL "Olymp" were collected from individuals in Nur-Sultan city diagnosed with COVID-19 from August 2021 to May 2022 using real-time reverse transcription-quantitative polymerase chain reaction (RT-qPCR). All samples had a cycle threshold (Ct) value below 20 with an average Ct value of 17.03. The overall average value of sequencing depth coverage for samples is 244X. 341 whole-genome sequences that passed quality control were deposited in the Global initiative on sharing all influenza data (GISAID). The BA.1.1 (n = 189), BA.1 (n = 15), BA.2 (n = 3), BA.1.15 (n = 1), BA.1.17.2 (n = 1) omicron lineages, AY.122 (n = 119), B.1.617.2 (n = 8), AY.111 (n = 2), AY.126 (n = 1), AY.4 (n = 1) delta lineages, one sample B.1.1.7 (n = 1) belongs to alpha lineage, and one sample B.1.637 (n = 1) belongs to small sublineage were detected in this study. This is the first study of SARS-CoV-2 whole-genome sequencing by the ONT approach in Kazakhstan, which can be expanded for the investigation of other emerging viral or bacterial infections on the country level.

9.
Front Genet ; 12: 683632, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34795689

RESUMEN

Independent Component Analysis is a matrix factorization method for data dimension reduction. ICA has been widely applied for the analysis of transcriptomic data for blind separation of biological, environmental, and technical factors affecting gene expression. The study aimed to analyze the publicly available esophageal cancer data using the ICA for identification and comprehensive analysis of reproducible signaling pathways and molecular signatures involved in this cancer type. In this study, four independent esophageal cancer transcriptomic datasets from GEO databases were used. A bioinformatics tool « BiODICA-Independent Component Analysis of Big Omics Data¼ was applied to compute independent components (ICs). Gene Set Enrichment Analysis (GSEA) and ToppGene uncovered the most significantly enriched pathways. Construction and visualization of gene networks and graphs were performed using the Cytoscape, and HPRD database. The correlation graph between decompositions into 30 ICs was built with absolute correlation values exceeding 0.3. Clusters of components-pseudocliques were observed in the structure of the correlation graph. The top 1,000 most contributing genes of each ICs in the pseudocliques were mapped to the PPI network to construct associated signaling pathways. Some cliques were composed of densely interconnected nodes and included components common to most cancer types (such as cell cycle and extracellular matrix signals), while others were specific to EC. The results of this investigation may reveal potential biomarkers of esophageal carcinogenesis, functional subsystems dysregulated in the tumor cells, and be helpful in predicting the early development of a tumor.

10.
PeerJ ; 9: e11333, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33987016

RESUMEN

BACKGROUND: High-throughput sequencing platforms generate a massive amount of high-dimensional genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis and interpretation of genomics data becomes essential during the analysis of sequencing data. Different standard data types and file formats have been developed to store and analyze sequence and genomics data. Variant Call Format (VCF) is the most widespread genomics file type and standard format containing genomic information and variants of sequenced samples. RESULTS: Existing tools for processing VCF files don't usually have an intuitive graphical interface, but instead have just a command-line interface that may be challenging to use for the broader biomedical community interested in genomics data analysis. re-Searcher solves this problem by pre-processing VCF files by chunks to not load RAM of computer. The tool can be used as standalone user-friendly multiplatform GUI application as well as web application (https://nla-lbsb.nu.edu.kz). The software including source code as well as tested VCF files and additional information are publicly available on the GitHub repository (https://github.com/LabBandSB/re-Searcher).

11.
BMC Res Notes ; 14(1): 45, 2021 Feb 04.
Artículo en Inglés | MEDLINE | ID: mdl-33541395

RESUMEN

OBJECTIVES: Kazakhstan is a Central Asian crossroad of European and Asian populations situated along the way of the Great Silk Way. The territory of Kazakhstan has historically been inhabited by nomadic tribes and today is the multi-ethnic country with the dominant Kazakh ethnic group. We sequenced and analyzed the whole-genomes of five ethnic healthy Kazakh individuals with high coverage using next-generation sequencing platform. This whole-genome sequence data of healthy Kazakh individuals can be a valuable reference for biomedical studies investigating disease associations and population-wide genomic studies of ethnically diverse Central Asian region. DATA DESCRIPTION: Blood samples have been collected from five ethnic healthy Kazakh individuals living in Kazakhstan. The genomic DNA was extracted from blood and sequenced. Sequencing was performed on Illumina HiSeq2000 next-generation sequencing platform. We sequenced and analyzed the whole-genomes of ethnic Kazakh individuals with the coverage ranging from 26 to 32X. Ranging from 98.85 to 99.58% base pairs were totally mapped and aligned on the human reference genome GRCh37 hg19. Het/Hom and Ts/Tv ratios for each whole genome ranged from 1.35 to 1.49 and from 2.07 to 2.08, respectively. Sequencing data are available in the National Center for Biotechnology Information SRA database under the accession number PRJNA374772.


Asunto(s)
Pueblo Asiatico , Genoma Humano , Pueblo Asiatico/genética , Etnicidad/genética , Humanos , Kazajstán , Secuenciación Completa del Genoma
12.
Front Genet ; 12: 683515, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34858467

RESUMEN

Tuberculosis (TB) is an infectious disease that remains an essential public health problem in many countries. Despite decreasing numbers of new cases worldwide, the incidence of antibiotic-resistant forms (multidrug resistant and extensively drug-resistant) of TB is increasing. Next-generation sequencing technologies provide a high-throughput approach to identify known and novel potential genetic variants that are associated with drug resistance in Mycobacterium tuberculosis (Mtb). There are limited reports and data related to whole-genome characteristics of drug-resistant Mtb strains circulating in Kazakhstan. Here, we report whole-genome sequencing and analysis results of eight multidrug-resistant strains collected from TB patients in Kazakhstan. Genotyping and validation of all strains by MIRU-VNTR and spoligotyping methodologies revealed that these strains belong to the Beijing family. The spectrum of specific and potentially novel genomic variants (single-nucleotide polymorphisms, insertions, and deletions) related to drug resistance was identified and annotated. ResFinder, CARD, and CASTB antibiotic resistance databases were used for the characterization of genetic variants in genes associated with drug resistance. Our results provide reference data and genomic profiles of multidrug-resistant isolates for further comparative studies and investigations of genetic patterns in drug-resistant Mtb strains.

13.
PeerJ ; 9: e10711, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33552729

RESUMEN

BACKGROUND: Ventricular tachycardia (VT) is a major cause of sudden cardiac death (SCD). Clinical investigations can sometimes fail to identify the underlying cause of VT and the event is classified as idiopathic (iVT). VT contributes significantly to the morbidity and mortality in patients with coronary artery disease (CAD) and dilated cardiomyopathy (DCM). Since mutations in arrhythmia-associated genes frequently determine arrhythmia susceptibility screening for disease-predisposing variants could improve VT diagnostics and prevent SCD in patients. METHODS: Ninety-two patients diagnosed with coronary heart disease (CHD), DCM, or iVT were included in our study. We evaluated genetic profiles and variants in known cardiac risk genes by targeted next generation sequencing (NGS) using a newly designed custom panel of 96 genes. We hypothesized that shared morphological and phenotypical features among these subgroups may have an overlapping molecular base. To our knowledge, this was the first study of the deep sequencing of 96 targeted cardiac genes in Kazakhstan. The clinical significance of the sequence variants was interpreted according to the guidelines developed by the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) in 2015. The ClinVar and Varsome databases were used to determine the variant classifications. RESULTS: Targeted sequencing and stepwise filtering of the annotated variants identified a total of 307 unique variants in 74 genes, totally 456 variants in the overall study group. We found 168 mutations listed in the Human Genome Mutation Database (HGMD) and another 256 rare/unique variants with elevated pathogenic potential. There was a predominance of high- to intermediate pathogenicity variants in LAMA2, MYBPC3, MYH6, KCNQ1, GAA, and DSG2 in CHD VT patients. Similar frequencies were observed in DCM VT, and iVT patients, pointing to a common molecular disease association. TTN, GAA, LAMA2, and MYBPC3 contained the most variants in the three subgroups which confirm the impact of these genes in the complex pathogenesis of cardiomyopathies and VT. The classification of 307 variants according to ACMG guidelines showed that nine (2.9%) variants could be classified as pathogenic, nine (2.9%) were likely pathogenic, 98 (31.9%) were of uncertain significance, 73 (23.8%) were likely benign, and 118 (38.4%) were benign. CHD VT patients carry rare genetic variants with increased pathogenic potential at a comparable frequency to DCM VT and iVT patients in genes related to sarcomere function, nuclear function, ion flux, and metabolism. CONCLUSIONS: In this study we showed that in patients with VT secondary to coronary artery disease, DCM, or idiopathic etiology multiple rare mutations and clinically significant sequence variants in classic cardiac risk genes associated with cardiac channelopathies and cardiomyopathies were found in a similar pattern and at a comparable frequency.

14.
Data Brief ; 33: 106416, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33102665

RESUMEN

Drug-resistant tuberculosis (TB) is a major public health problem. Clinical Mycobacterium tuberculosis (MTB) isolate with Extensively drug-resistant tuberculosis (MTB-XDR) profile was subjected to whole-genome sequencing using a next-generation sequencing platform (NGS) Roche 454 GS FLX+ followed by bioinformatics sequence analysis. Quality of read was checked by FastQC, paired-end reads were trimmed using Trimmomatic. De novo genome assembly was conducted using Velvet v.1.2.10. The assembled genome of XDR-TB-1599 strain was functionally annotated using the PATRIC platform. Analysis of de novo assembled genome was performed using ResFinder, CARD, CASTB and TB-Profiler tools. MIRU_VNTR genotyping on 12 loci and spoligotyping have been performed for XDR-TB-1599 isolate. M. tuberculosis XDR-TB-1599 strain yielded an average read depth of 21-fold with overall 4 199 325 bp. The assembled genome contains 5528 protein-coding genes, including key drug resistance and virulence-associated genes and GC content of 65.4%. We identified that all proteins encoded by this strain contain conserved domains associated with the first-line anti-tuberculosis drugs such as rifampicin, isoniazid, streptomycin and ethionamide. TB-Profiler had higher average concordance results with phenotypic DST (drug susceptibility testing) in comparison with ResFinder, CARD, CASTB profiling to first-line (75% vs 50%) and second-line (25% vs 0%) of anti-TB drugs, correspondingly. To our knowledge, this is the first report of a highly annotated and characterized whole-genome sequence and de novo assembled XDR-TB M.tuberculosis strain isolated from a sputum of new TB case-patient from Kazakhstan performed on Roche 454 GS FLX+ platform. This report highlights an important role of whole-genome sequencing technology and analysis as an advanced approach for drug-resistance investigations of circulated TB isolates.

15.
Genome Announc ; 3(3)2015 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-25977410

RESUMEN

We announce the draft genome sequence of the type strain Lactobacillus rhamnosus CLS17 (2,889,314 nt, with a GC content of 46.8%), which is one of the most prevalent lactic acid bacteria present during the manufacturing process of dairy products; the genome consists of 71 large contigs (>100 bp in size). It contains 2,643 protein-coding sequences, single predicted copies of the 5S, 16S, and 23S rRNA genes, and 51 predicted tRNAs.

16.
Genome Announc ; 3(3)2015 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-25977436

RESUMEN

Here, we report the draft genome sequences of two clinical isolates of Mycobacterium tuberculosis (MTB-476 and MTB-489) isolated from sputum of Kazakh patients.

17.
Cent Asian J Glob Health ; 3(Suppl): 146, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-29805883

RESUMEN

INTRODUCTION: The human genome sequence will underpin human biology and medicine in the next century, providing a single, essential reference to all genetic information. Extraordinary technological advances and decreases in the cost of DNA sequencing have made the possibility of whole genome sequencing (WGS) feasible as a highly accessible test for numerous indications. The international project "Genetic architecture of Kazakh population" is well underway to determine the complete DNA. Next generation sequencing is a powerful tool for genetic analysis, which will enable us to uncover the association of loci at specific sites in the genome associated with disease. The aim of this study was to introduce first data on WGS of 6 Kazakh individuals. METHODS: This pilot study is among the first WGS performed on 6 healthy Kazakh individuals, using next generation sequencing platform HiSeq2000, Illumina by manufacturer's protocols. All generated *.bcl files were simultaneously converted and demultiplexed using bcl2fasta application. Alignment of sequence reads performed using bwa-mem against human b19 reference genome. Sorting, removing of intermediate files, *.bam files assembling, and marking duplicates were performed using PicardTools package. GATK haplotype caller tool was used for variant calling. ClinVar, SNPedia, and Cosmic databases were processed to identify clinical genomic variants in 6 Kazakh whole genomes. Java Runtime Environment and R. Bioconductor packages were installed to perform raw data processing and run program scripts. RESULTS: The sequence alignment and mapping procedures on reference genome hg19 of each 6 healthy Kazakh individual were completed. Between 87,308,581,400 and 107,526,741,301 total base pairs were sequenced with average coverage x29.85. Between 98.85% and 99.58% base pairs were totally mapped and on average 96.07% were properly paired. Het/Hom and Ti/Tv ratios for each whole genome ranged from 1.35 to 1.52 and from 2.07 to 2.08, respectively. We compared and analyzed each genome with on existing clinical databases ClinVar, SNPedia, Cosmic and found from 20 to 25, from 269 to 288, from 7 to 12 SNP records, respectively. The availability of a reference Kazakh genome sequences provides the basis for studying the nature of sequence variation, particularly single nucleotide polymorphisms. CONCLUSION: The first whole genome sequencing of Kazakhs were performed. In this pilot study, we identified SNPs associated with different conditions. Further studies of WGS on Kazakh population are needed to identify possible unique genetic variants in Kazakhs.

18.
Cent Asian J Glob Health ; 3(Suppl): 181, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-29805910

RESUMEN

INTRODUCTION: Tuberculosis (TB) is caused by bacterium Mycobacterium tuberculosis (MTB), and according to the WHO, up to 30% of world population is infected with latent TB. Pathogenesis of TB is multifactorial, and its development depends on environmental, social, microbial, and genetic factors of both the bacterium and the host. The number of TB cases in Kazakhstan has decreased in the past decade, but multidrug-resistant (MDR) TB cases are dramatically increasing. Polymorphisms in genes responsible for immune response have been associated with TB susceptibility. The objective of this study was to investigate the risk of developing pulmonary TB (PTB) associated with polymorphisms in several inflammatory pathway genes among Kazakhstani population. METHODS: 703 participants from 3 regions of Kazakhstan were recruited for a case-control study. 251 participants had pulmonary TB (PTB), and 452 were healthy controls (HC). Males and females represented 42.39% and 57.61%, respectively. Of all participants, 67.4% were Kazakhs, 22.8% Russians, 3.4% Ukrainians, and 6.4% were of other origins. Clinical and epidemiological data were collected from medical records, interviews, and questionnaires. DNA samples were genotyped using TaqMan assay on 4 polymorphisms: IFNγ (rs2430561) and IL1ß (rs16944), TLR2 (rs5743708) and TLR8 (rs3764880). Statistical data was analyzed using SPSS 19. RESULTS: Genotyping by IFγ, IL1ß, TLR2 showed no significant association with PTB susceptibility (p > 0.05). TLR8 genotype A/G was significantly higher in females (F/M - 41.5%/1.3%) and G/G in males (M/F - 49%/20.7%) (χ2=161.43, p < 0.001). A significantly increased risk of PTB development was observed for TLR A/G with an adjusted OR of 1.48 (95%, CI: 0.96 - 2.28), and a protective feature was revealed for TLR8 G/G genotype (OR: 0.81, 95%, CI: 0.56 - 1.16, p = 0.024). Additional grouping by gender revealed that TLR8 G/G contributes as protective genotype (OR: 1.83, 95%, CI: 1.18 - 2.83, p = 0.036) in males of the control group. CONCLUSION: Results indicate that heterozygous genotype A/G of TLR8 increases the risk of PTB development, while G/G genotype may serve as protection mechanism. A/A genotype is strongly associated with susceptibility to PTB. To clarify the role of other polymorphisms in susceptibility to PTB in Kazakhstani population, further investigations are needed.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA