Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
1.
Sci Rep ; 14(1): 15518, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38969748

RESUMO

Lebanon's rich history as a cultural crossroad spanning millennia has significantly impacted the genetic composition of its population through successive waves of migration and conquests from surrounding regions. Within modern-day Lebanon, the Koura district stands out with its unique cultural foundations, primarily characterized by a notably high concentration of Greek Orthodox Christians compared to the rest of the country. This study investigates whether the prevalence of Greek Orthodoxy in Koura can be attributed to modern Greek heritage or continuous blending resulting from the ongoing influx of refugees and trade interactions with Greece and Anatolia. We analyzed both ancient and modern DNA data from various populations in the region which could have played a role in shaping the current population of Koura using our own and published data. Our findings indicate that the genetic influence stemming directly from modern Greek immigration into the area appears to be limited. While the historical presence of Greek colonies has left its mark on the region's past, the distinctive character of Koura seems to have been primarily shaped by cultural and political factors, displaying a stronger genetic connection mostly with Anatolia, with affinity to ancient but not modern Greeks.


Assuntos
Genética Populacional , Líbano , Humanos , Grécia , Migração Humana , Turquia , Etnicidade/genética
2.
Acta Diabetol ; 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38767674

RESUMO

AIMS: Hypertension (HTN) and Type 2 Diabetes (T2D) often coexist, therefore understanding the relationship between both diseases is imperative to guide targeted prevention/therapy. This study aims to explore the relationship between HTN and T2D using genome-wide association study (GWAS) analysis and biochemical data to understand the implication of both clinical and genetic factors in these pathologies. METHODS: A total of 2,876 patients were enrolled. Using GWAS and biochemical data, patients with both T2D and HTN were compared to patients with only HTN. Specificity was confirmed by testing the detected genetic variants for associations with HTN development in T2D patients, or with HTN in healthy subjects. Regression models were applied to examine the association of T2D in patients with HTN with cardiovascular risk factors. Replication was performed using UK Biobank dataset with 31,170 subjects. RESULTS: Data showed that females with HTN are at higher risk of developing T2D due to dyslipidemia, while males faced higher risk due to high BMI (body mass index) and family history of T2D. GWAS identified Single Nucleotide Polymorphisms (SNPs) linked to T2D in patients with HTN. Notably, rs7865889, rs7756992, and rs10896290 were positively associated with T2D, whereas rs12737517 yielded negative association. Three SNPs were replicated in the UK Biobank (rs10896290, rs7865889, and rs7756992). CONCLUSION: Incorporating clinical and genetic screening into risk assessment is important for the detection and prevention of T2D in patients with HTN. The detected SNPs (rs7865889, rs12737517, and rs10896290), especially the protective SNP (rs12737517), provide an opportunity for better diagnosis, prevention, and therapy of patients with T2D and HTN.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38691429

RESUMO

DNA damage is a critical factor in the onset and progression of cancer. When DNA is damaged, the number of genetic mutations increases, making it necessary to activate DNA repair mechanisms. A crucial factor in the base excision repair process, which helps maintain the stability of the genome, is an enzyme called DNA polymerase [Formula: see text] (Pol[Formula: see text]) encoded by the POLB gene. It plays a vital role in the repair of damaged DNA. Additionally, variations known as Single Nucleotide Polymorphisms (SNPs) in the POLB gene can potentially affect the ability to repair DNA. This study uses bioinformatics tools that extract important features from SNPs to construct a feature matrix, which is then used in combination with machine learning algorithms to predict the likelihood of developing cancer associated with a specific mutation. Eight different machine learning algorithms were used to investigate the relationship between POLB gene variations and their potential role in cancer onset. This study not only highlights the complex link between POLB gene SNPs and cancer, but also underscores the effectiveness of machine learning approaches in genomic studies, paving the way for advanced predictive models in genetic and cancer research.

4.
PLoS One ; 19(4): e0298325, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38578803

RESUMO

Surveillance methods of circulating antibiotic resistance genes (ARGs) are of utmost importance in order to tackle what has been described as one of the greatest threats to humanity in the 21st century. In order to be effective, these methods have to be accurate, quickly deployable, and scalable. In this study, we compare metagenomic shotgun sequencing (TruSeq DNA sequencing) of wastewater samples with a state-of-the-art PCR-based method (Resistomap HT-qPCR) on four wastewater samples that were taken from hospital, industrial, urban and rural areas. ARGs that confer resistance to 11 antibiotic classes have been identified in these wastewater samples using both methods, with the most abundant observed classes of ARGs conferring resistance to aminoglycoside, multidrug-resistance (MDR), macrolide-lincosamide-streptogramin B (MLSB), tetracycline and beta-lactams. In comparing the methods, we observed a strong correlation of relative abundance of ARGs obtained by the two tested methods for the majority of antibiotic classes. Finally, we investigated the source of discrepancies in the results obtained by the two methods. This analysis revealed that false negatives were more likely to occur in qPCR due to mutated primer target sites, whereas ARGs with incomplete or low coverage were not detected by the sequencing method due to the parameters set in the bioinformatics pipeline. Indeed, despite the good correlation between the methods, each has its advantages and disadvantages which are also discussed here. By using both methods together, a more robust ARG surveillance program can be established. Overall, the work described here can aid wastewater treatment plants that plan on implementing an ARG surveillance program.


Assuntos
Antibacterianos , Águas Residuárias , Antibacterianos/farmacologia , Antibacterianos/análise , Genes Bacterianos , Tetraciclina/análise , Resistência Microbiana a Medicamentos/genética
5.
Lipids Health Dis ; 23(1): 56, 2024 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-38389069

RESUMO

BACKGROUND: Type 2 Diabetes (T2D) is influenced by genetic, environmental, and ageing factors. Ageing pathways exacerbate metabolic diseases. This study aimed to examine both clinical and genetic factors of T2D in older adults. METHODS: A total of 2,909 genotyped patients were enrolled in this study. Genome Wide Association Study was conducted, comparing T2D patients to non-diabetic older adults aged ≥ 60, ≥ 65, or ≥ 70 years, respectively. Binomial logistic regressions were applied to examine the association between T2D and various risk factors. Stepwise logistic regression was conducted to explore the impact of low HDL (HDL < 40 mg/dl) on the relationship between the genetic variants and T2D. A further validation step using data from the UK Biobank with 53,779 subjects was performed. RESULTS: The association of T2D with both low HDL and family history of T2D increased with the age of control groups. T2D susceptibility variants (rs7756992, rs4712523 and rs10946403) were associated with T2D, more significantly with increased age of the control group. These variants had stronger effects on T2D risk when combined with low HDL cholesterol levels, especially in older control groups. CONCLUSIONS: The findings highlight a critical role of age, genetic predisposition, and HDL levels in T2D risk. The findings suggest that individuals over 70 years who have high HDL levels without the T2D susceptibility alleles may be at the lowest risk of developing T2D. These insights can inform tailored preventive strategies for older adults, enhancing personalized T2D risk assessments and interventions.


Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Idoso , Diabetes Mellitus Tipo 2/genética , Alelos , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , Predisposição Genética para Doença , HDL-Colesterol/genética
6.
Diabetes Res Clin Pract ; 207: 111052, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38072013

RESUMO

AIMS: Type 2 diabetes (T2D) and coronary artery disease (CAD) often coexist and share genetic factors.This study aimed to investigate the common genetic factors underlying T2D and CAD in patients with CAD. METHODS: A three-step association approach was conducted: a) a discovery step involving 943 CAD patients with T2D and 1,149 CAD patients without T2D; b) an eliminating step to exclude CAD or T2D specific variants; and c) a replication step using the UK Biobank data. RESULTS: Ten genetic loci were associated with T2D in CAD patients. Three variants were specific to either CAD or T2D. Five variants lost significance after adjusting for covariates, while two SNPs remained associated with T2D in CAD patients (rs7904519*G: TCF7L2 and rs17608766*C: GOSR2). The T2D susceptibility rs7904519*G was associated with increased T2D risk, while the CAD susceptibility rs17608766*C was negatively associated with T2D in CAD patients. These associations were replicated in a UK Biobank data, confirming the results. CONCLUSIONS: No significant common T2D and CAD susceptibility genetic association was demonstrated indicating distinct disease pathways. However, CAD patients carrying the T2D susceptibility gene TCF7L2 remain at higher risk for developing T2D emphasizing the need for frequent monitoring in this subgroup.


Assuntos
Doença da Artéria Coronariana , Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/complicações , Doença da Artéria Coronariana/genética , Doença da Artéria Coronariana/complicações , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único , Loci Gênicos , Fatores de Risco , Proteína 2 Semelhante ao Fator 7 de Transcrição/genética , Proteínas Qb-SNARE/genética
7.
BMC Bioinformatics ; 24(1): 354, 2023 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-37735350

RESUMO

BACKGROUND: Plummeting DNA sequencing cost in recent years has enabled genome sequencing projects to scale up by several orders of magnitude, which is transforming genomics into a highly data-intensive field of research. This development provides the much needed statistical power required for genotype-phenotype predictions in complex diseases. METHODS: In order to efficiently leverage the wealth of information, we here assessed several genomic data science tools. The rationale to focus on on-premise installations is to cope with situations where data confidentiality and compliance regulations etc. rule out cloud based solutions. We established a comprehensive qualitative and quantitative comparison between BCFtools, SnpSift, Hail, GEMINI, and OpenCGA. The tools were compared in terms of data storage technology, query speed, scalability, annotation, data manipulation, visualization, data output representation, and availability. RESULTS: Tools that leverage sophisticated data structures are noted as the most suitable for large-scale projects in varying degrees of scalability in comparison to flat-file manipulation (e.g., BCFtools, and SnpSift). Remarkably, for small to mid-size projects, even lightweight relational database. CONCLUSION: The assessment criteria provide insights into the typical questions posed in scalable genomics and serve as guidance for the development of scalable computational infrastructure in genomics.


Assuntos
Ciência de Dados , Genômica , Mapeamento Cromossômico , Bases de Dados Factuais , Análise de Sequência de DNA
8.
Artigo em Inglês | MEDLINE | ID: mdl-37047998

RESUMO

Patient experience is a widely used indicator for assessing the quality-of-care process during a patient's journey in hospital. However, the literature rarely discusses three components: patient stress, anxiety, and frustration. Furthermore, little is known about what drives each component during hospital visits. In order to explore this, we utilized data from a patient experience survey, including patient- and provider-related determinants, that was administered at a local hospital in Abu Dhabi, UAE. A machine-learning-based random forest (RF) algorithm, along with its embedded importance analysis function feature, was used to explore and rank the drivers of patient stress, anxiety, and frustration throughout two stages of the patient journey: registration and consultation. The attribute 'age' was identified as the primary patient-related determinant driving patient stress, anxiety, and frustration throughout the registration and consultation stages. In the registration stage, 'total time taken for registration' was the key driver of patient stress, whereas 'courtesy demonstrated by the registration staff in meeting your needs' was the key driver of anxiety and frustration. In the consultation step, 'waiting time to see the doctor/physician' was the key driver of both patient stress and frustration, whereas 'the doctor/physician was able to explain your symptoms using language that was easy to understand' was the main driver of anxiety. The RF algorithm provided valuable insights, showing the relative importance of factors affecting patient stress, anxiety, and frustration throughout the registration and consultation stages. Healthcare managers can utilize and allocate resources to improve the overall patient experience during hospital visits based on the importance of patient- and provider-related determinants.


Assuntos
Ansiedade , Frustração , Humanos , Transtornos de Ansiedade , Inquéritos e Questionários , Avaliação de Resultados da Assistência ao Paciente
9.
Vasc Health Risk Manag ; 19: 31-41, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36703868

RESUMO

Backgrounds and Aims: The role of Lipoprotein(a) (Lp(a)) in increasing the risk of cardiovascular diseases is reported in several populations. The aim of this study is to investigate the correlation of high Lp(a) levels with the degree of coronary artery stenosis. Methods: Two hundred and sixty-eight patients were enrolled for this study. Patients who underwent coronary artery angiography and who had Lp(a) measurements available were included in this study. Binomial logistic regressions were applied to investigate the association between Lp(a) and stenosis in the four major coronary arteries. The effect of LDL and HDL Cholesterol on modulating the association of Lp(a) with coronary artery disease (CAD) was also evaluated. Multinomial regression analysis was applied to assess the association of Lp(a) with the different degrees of stenosis in the four major coronary arteries. Results: Our analyses showed that Lp(a) is a risk factor for CAD and this risk is significantly apparent in patients with HDL-cholesterol ≥35 mg/dL and in non-obese patients. A large proportion of the study patients with elevated Lp(a) levels had CAD even when exhibiting high HDL serum levels. Increased HDL with low Lp(a) serum levels were the least correlated with stenosis. A significantly higher levels of Lp(a) were found in patients with >50% stenosis in at least two major coronary vessels arguing for pronounced and multiple stenotic lesions. Finally, the derived variant (rs1084651) of the LPA gene was significantly associated with CAD. Conclusion: Our study highlights the importance of Lp(a) levels as an independent biological marker of severe and multiple coronary artery stenosis.


Assuntos
Doença da Artéria Coronariana , Estenose Coronária , Humanos , Constrição Patológica , Estenose Coronária/diagnóstico por imagem , Angiografia Coronária , Lipoproteína(a) , Fatores de Risco , HDL-Colesterol
10.
BMC Bioinformatics ; 23(1): 511, 2022 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-36447153

RESUMO

BACKGROUND: For some understudied populations, genotype data is minimal for genotype-phenotype prediction. However, we can use the data of some other large populations to learn about the disease-causing SNPs and use that knowledge for the genotype-phenotype prediction of small populations. This manuscript illustrated that transfer learning is applicable for genotype data and genotype-phenotype prediction. RESULTS: Using HAPGEN2 and PhenotypeSimulator, we generated eight phenotypes for 500 cases/500 controls (CEU, large population) and 100 cases/100 controls (YRI, small populations). We considered 5 (4 phenotypes) and 10 (4 phenotypes) different risk SNPs for each phenotype to evaluate the proposed method. The improved accuracy with transfer learning for eight different phenotypes was between 2 and 14.2 percent. The two-tailed p-value between the classification accuracies for all phenotypes without transfer learning and with transfer learning was 0.0306 for five risk SNPs phenotypes and 0.0478 for ten risk SNPs phenotypes. CONCLUSION: The proposed pipeline is used to transfer knowledge for the case/control classification of the small population. In addition, we argue that this method can also be used in the realm of endangered species and personalized medicine. If the large population data is extensive compared to small population data, expect transfer learning results to improve significantly. We show that Transfer learning is capable to create powerful models for genotype-phenotype predictions in large, well-studied populations and fine-tune these models to populations were data is sparse.


Assuntos
Aprendizado Profundo , Genótipo , Fenótipo , Estudos de Casos e Controles , Conhecimento
11.
Front Bioinform ; 2: 914435, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36304278

RESUMO

Converting genotype sequences into images offers advantages, such as genotype data visualization, classification, and comparison of genotype sequences. This study converted genotype sequences into images, applied two-dimensional convolutional neural networks for case/control classification, and compared the results with the one-dimensional convolutional neural network. Surprisingly, the average accuracy of multiple runs of 2DCNN was 0.86, and that of 1DCNN was 0.89, yielding a difference of 0.03, which suggests that even the 2DCNN algorithm works on genotype sequences. Moreover, the results generated by the 2DCNN exhibited less variation than those generated by the 1DCNN, thereby offering greater stability. The purpose of this study is to draw the research community's attention to explore encoding schemes for genotype data and machine learning algorithms that can be used on genotype data by changing the representation of the genotype data for case/control classification.

12.
Sci Rep ; 12(1): 14669, 2022 08 29.
Artigo em Inglês | MEDLINE | ID: mdl-36038563

RESUMO

Since the declaration of SARS-CoV-2 outbreak as a pandemic, the United Arab Emirates (UAE) public health authorities have adopted strict measures to reduce transmission as early as March 2020. As a result of these measures, flight suspension, nationwide RT-PCR and surveillance of viral sequences were extensively implemented. This study aims to characterize the epidemiology, transmission pattern, and emergence of variants of concerns (VOCs) and variants of interests (VOIs) of SARS-CoV-2 in the UAE, followed by the investigation of mutations associated with hospitalized cases. A total of 1274 samples were collected and sequenced from all seven emirates between the period of 25 April 2020 to 15 February 2021. Phylogenetic analysis demonstrated multiple introductions of SARS-CoV-2 into the UAE in the early pandemic, followed by a local spread of root clades (A, B, B.1 and B.1.1). As the international flight resumed, the frequencies of VOCs surged indicating the January peak of positive cases. We observed that the hospitalized cases were significantly associated with the presence of B.1.1.7 (p < 0.001), B.1.351 (p < 0.001) and A.23.1 (p = 0.009). Deceased cases are more likely to occur in the presence of B.1.351 (p < 0.001) and A.23.1 (p = 0.022). Logistic and ridge regression showed that 51 mutations are significantly associated with hospitalized cases with the highest proportion originated from S and ORF1a genes (31% and 29% respectively). Our study provides an epidemiological insight of the emergence of VOCs and VOIs following the borders reopening and worldwide travels. It provides reassurance that hospitalization is markedly more associated with the presence of VOCs. This study can contribute to understand the global transmission of SARS-CoV-2 variants.


Assuntos
COVID-19 , SARS-CoV-2 , COVID-19/epidemiologia , Genômica , Humanos , Filogenia , SARS-CoV-2/genética , Emirados Árabes Unidos/epidemiologia
13.
PLoS Comput Biol ; 18(4): e1010050, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35404958

RESUMO

Scientific research is shedding light on the interaction of the gut microbiome with the human host and on its role in human health. Existing machine learning methods have shown great potential in discriminating healthy from diseased microbiome states. Most of them leverage shotgun metagenomic sequencing to extract gut microbial species-relative abundances or strain-level markers. Each of these gut microbial profiling modalities showed diagnostic potential when tested separately; however, no existing approach combines them in a single predictive framework. Here, we propose the Multimodal Variational Information Bottleneck (MVIB), a novel deep learning model capable of learning a joint representation of multiple heterogeneous data modalities. MVIB achieves competitive classification performance while being faster than existing methods. Additionally, MVIB offers interpretable results. Our model adopts an information theoretic interpretation of deep neural networks and computes a joint stochastic encoding of different input data modalities. We use MVIB to predict whether human hosts are affected by a certain disease by jointly analysing gut microbial species-relative abundances and strain-level markers. MVIB is evaluated on human gut metagenomic samples from 11 publicly available disease cohorts covering 6 different diseases. We achieve high performance (0.80 < ROC AUC < 0.95) on 5 cohorts and at least medium performance on the remaining ones. We adopt a saliency technique to interpret the output of MVIB and identify the most relevant microbial species and strain-level markers to the model's predictions. We also perform cross-study generalisation experiments, where we train and test MVIB on different cohorts of the same disease, and overall we achieve comparable results to the baseline approach, i.e. the Random Forest. Further, we evaluate our model by adding metabolomic data derived from mass spectrometry as a third input modality. Our method is scalable with respect to input data modalities and has an average training time of < 1.4 seconds. The source code and the datasets used in this work are publicly available.


Assuntos
Microbioma Gastrointestinal , Microbiota , Humanos , Aprendizado de Máquina , Metagenoma , Metagenômica/métodos , Microbiota/genética
14.
PLoS One ; 17(3): e0264682, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35235585

RESUMO

Global and local whole genome sequencing of SARS-CoV-2 enables the tracing of domestic and international transmissions. We sequenced Viral RNA from 37 sampled Covid-19 patients with RT-PCR-confirmed infections across the UAE and developed time-resolved phylogenies with 69 local and 3,894 global genome sequences. Furthermore, we investigated specific clades associated with the UAE cohort and, their global diversity, introduction events and inferred domestic and international virus transmissions between January and June 2020. The study comprehensively characterized the genomic aspects of the virus and its spread within the UAE and identified that the prevalence shift of the D614G mutation was due to the later introductions of the G-variant associated with international travel, rather than higher local transmissibility. For clades spanning different emirates, the most recent common ancestors pre-date domestic travel bans. In conclusion, we observe a steep and sustained decline of international transmissions immediately following the introduction of international travel restrictions.


Assuntos
COVID-19/transmissão , COVID-19/virologia , Controle de Infecções/métodos , SARS-CoV-2/genética , Viagem/estatística & dados numéricos , Adolescente , Adulto , Idoso , COVID-19/epidemiologia , Criança , Pré-Escolar , Feminino , Genoma Viral/genética , Humanos , Masculino , Pessoa de Meia-Idade , Tipagem Molecular/métodos , Mutação , Filogenia , RNA Viral , SARS-CoV-2/isolamento & purificação , Análise de Sequência de RNA , Doença Relacionada a Viagens , Emirados Árabes Unidos/epidemiologia , Sequenciamento Completo do Genoma , Adulto Jovem
15.
Front Microbiol ; 12: 761067, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34803986

RESUMO

The interplay between the compositional changes in the gastrointestinal microbiome, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) susceptibility and severity, and host functions is complex and yet to be fully understood. This study performed 16S rRNA gene-based microbial profiling of 143 subjects. We observed structural and compositional alterations in the gut microbiota of the SARS-CoV-2-infected group in comparison to non-infected controls. The gut microbiota composition of the SARS-CoV-2-infected individuals showed an increase in anti-inflammatory bacteria such as Faecalibacterium (p-value = 1.72 × 10-6) and Bacteroides (p-value = 5.67 × 10-8). We also revealed a higher relative abundance of the highly beneficial butyrate producers such as Anaerostipes (p-value = 1.75 × 10-230), Lachnospiraceae (p-value = 7.14 × 10-65), and Blautia (p-value = 9.22 × 10-18) in the SARS-CoV-2-infected group in comparison to the control group. Moreover, phylogenetic investigation of communities by reconstructing unobserved state (PICRUSt) functional prediction analysis of the 16S rRNA gene abundance data showed substantial differences in the enrichment of metabolic pathways such as lipid, amino acid, carbohydrate, and xenobiotic metabolism, in comparison between both groups. We discovered an enrichment of linoleic acid, ether lipid, glycerolipid, and glycerophospholipid metabolism in the SARS-CoV-2-infected group, suggesting a link to SARS-CoV-2 entry and replication in host cells. We estimate the major contributing genera to the four pathways to be Parabacteroides, Streptococcus, Dorea, and Blautia, respectively. The identified differences provide a new insight to enrich our understanding of SARS-CoV-2-related changes in gut microbiota, their metabolic capabilities, and potential screening biomarkers linked to COVID-19 disease severity.

16.
Cancers (Basel) ; 13(11)2021 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-34071263

RESUMO

For optimal pancreatic cancer treatment, early and accurate diagnosis is vital. Blood-derived biomarkers and genetic predispositions can contribute to early diagnosis, but they often have limited accuracy or applicability. Here, we seek to exploit the synergy between them by combining the biomarker CA19-9 with RNA-based variants. We use deep sequencing and deep learning to improve differentiating pancreatic cancer and chronic pancreatitis. We obtained samples of nucleated cells found in peripheral blood from 268 patients suffering from resectable, non-resectable pancreatic cancer, and chronic pancreatitis. We sequenced RNA with high coverage and obtained millions of variants. The high-quality variants served as input together with CA19-9 values to deep learning models. Our model achieved an area under the curve (AUC) of 96% in differentiating resectable cancer from pancreatitis using a test cohort. Moreover, we identified variants to estimate survival in resectable cancer. We show that the blood transcriptome harbours variants, which can substantially improve noninvasive clinical diagnosis.

18.
Front Genet ; 12: 660428, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33968136

RESUMO

The ethnic composition of the population of a country contributes to the uniqueness of each national DNA sequencing project and, ideally, individual reference genomes are required to reduce the confounding nature of ethnic bias. This work represents a representative Whole Genome Sequencing effort of an understudied population. Specifically, high coverage consensus sequences from 120 whole genomes and 33 whole exomes were used to construct the first ever population specific major allele reference genome for the United Arab Emirates (UAE). When this was applied and compared to the archetype hg19 reference, assembly of local Emirati genomes was reduced by ∼19% (i.e., some 1 million fewer calls). In compiling the United Arab Emirates Reference Genome (UAERG), sets of annotated 23,038,090 short (novel: 1,790,171) and 137,713 structural (novel: 8,462) variants; their allele frequencies (AFs) and distribution across the genome were identified. Population-specific genetic characteristics including loss-of-function variants, admixture, and ancestral haplogroup distribution were identified and reported here. We also detect a strong correlation between F ST and admixture components in the UAE. This baseline study was conceived to establish a high-quality reference genome and a genetic variations resource to enable the development of regional population specific initiatives and thus inform the application of population studies and precision medicine in the UAE.

19.
BMC Bioinformatics ; 22(1): 198, 2021 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-33874881

RESUMO

BACKGROUND: Genotype-phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning. RESULTS: The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96% respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97%. CONCLUSION: Genotype-phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification.


Assuntos
Aprendizado Profundo , Diabetes Mellitus Tipo 2 , Diabetes Mellitus Tipo 2/genética , Cor de Olho , Genótipo , Humanos , Fenótipo
20.
Front Genet ; 11: 681, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32754195

RESUMO

Whole genome sequences (WGS) of four nationals of the United Arab Emirates (UAE) at an average coverage of 33X have been completed and described. The selection of suitable subpopulation representatives was informed by a preceding comprehensive population structure analysis. Representatives were chosen based on their central location within the subpopulation on a principal component analysis (PCA) and the degree to which they were admixed. Novel genomic variations among the different subgroups of the UAE population are reported here. Specifically, the WGS analysis identified 4,161,067-4,798,806 variants in the four individual samples, where approximately 80% were single nucleotide polymorphisms (SNPs) and 20% were insertions or deletions (indels). An average of 2.75% was found to be novel variants according to dbSNP (build 151). This is the first report of structural variants (SV) from WGS data from UAE nationals. There were 15,677-20,339 called SVs, of which around 13.5% were novel. The four samples shared 1,399,178 variants, each with distinct variants as follows: 1,085,524 (for the individual denoted as UAE S011), 1,228,559 (UAE S012), 791,072 (UAE S013), and 906,818 (UAE S014). These results show a previously unappreciated population diversity in the region. The synergy of WGS and genotype array data was demonstrated through variant annotation of the former using 2.3 million allele frequencies for the local population derived from the latter technology platform. This novel approach of combining breadth and depth of array and WGS technologies has guided the choice of population genetic representatives and provides complementary, regionalized allele frequency annotation to new genomes comprising millions of loci.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA