Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Bioinformatics ; 40(Supplement_1): i199-i207, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940159

RESUMO

MOTIVATION: The emergence of COVID-19 (C19) created incredible worldwide challenges but offers unique opportunities to understand the physiology of its risk factors and their interactions with complex disease conditions, such as metabolic syndrome. To address the challenges of discovering clinically relevant interactions, we employed a unique approach for epidemiological analysis powered by redescription-based topological data analysis (RTDA). RESULTS: Here, RTDA was applied to Explorys data to discover associations among severe C19 and metabolic syndrome. This approach was able to further explore the probative value of drug prescriptions to capture the involvement of RAAS and hypertension with C19, as well as modification of risk factor impact by hyperlipidemia (HL) on severe C19. RTDA found higher-order relationships between RAAS pathway and severe C19 along with demographic variables of age, gender, and comorbidities such as obesity, statin prescriptions, HL, chronic kidney failure, and disproportionately affecting Black individuals. RTDA combined with CuNA (cumulant-based network analysis) yielded a higher-order interaction network derived from cumulants that furthered supported the central role that RAAS plays. TDA techniques can provide a novel outlook beyond typical logistic regressions in epidemiology. From an observational cohort of electronic medical records, it can find out how RAAS drugs interact with comorbidities, such as hypertension and HL, of patients with severe bouts of C19. Where single variable association tests with outcome can struggle, TDA's higher-order interaction network between different variables enables the discovery of the comorbidities of a disease such as C19 work in concert. AVAILABILITY AND IMPLEMENTATION: Code for performing TDA/RTDA is available in https://github.com/IBM/Matilda and code for CuNA can be found in https://github.com/BiomedSciAI/Geno4SD/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
COVID-19 , Hiperlipidemias , Síndrome Metabólica , Sistema Renina-Angiotensina , SARS-CoV-2 , Humanos , Síndrome Metabólica/epidemiologia , COVID-19/epidemiologia , Hiperlipidemias/epidemiologia , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Comorbidade , Hipertensão/epidemiologia , Fatores de Risco
2.
BMC Bioinformatics ; 24(1): 411, 2023 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-37907836

RESUMO

BACKGROUND: Identifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants. RESULTS: To overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans. CONCLUSIONS: CluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance.


Assuntos
Polimorfismo de Nucleotídeo Único , Humanos , Marcadores Genéticos , Desequilíbrio de Ligação , Fenótipo , Análise por Conglomerados
3.
Mol Biol Evol ; 38(5): 1809-1819, 2021 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-33481022

RESUMO

India represents an intricate tapestry of population substructure shaped by geography, language, culture, and social stratification. Although geography closely correlates with genetic structure in other parts of the world, the strict endogamy imposed by the Indian caste system and the large number of spoken languages add further levels of complexity to understand Indian population structure. To date, no study has attempted to model and evaluate how these factors have interacted to shape the patterns of genetic diversity within India. We merged all publicly available data from the Indian subcontinent into a data set of 891 individuals from 90 well-defined groups. Bringing together geography, genetics, and demographic factors, we developed Correlation Optimization of Genetics and Geodemographics to build a model that explains the observed population genetic substructure. We show that shared language along with social structure have been the most powerful forces in creating paths of gene flow in the subcontinent. Furthermore, we discover the ethnic groups that best capture the diverse genetic substructure using a ridge leverage score statistic. Integrating data from India with a data set of additional 1,323 individuals from 50 Eurasian populations, we find that Indo-European and Dravidian speakers of India show shared genetic drift with Europeans, whereas the Tibeto-Burman speaking tribal groups have maximum shared genetic drift with East Asians.


Assuntos
Etnicidade/genética , Variação Genética , Idioma , Modelos Genéticos , Fatores Sociológicos , Geografia , Humanos , Índia
4.
Bioinformatics ; 35(19): 3679-3683, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-30957838

RESUMO

MOTIVATION: Principal Component Analysis is a key tool in the study of population structure in human genetics. As modern datasets become increasingly larger in size, traditional approaches based on loading the entire dataset in the system memory (Random Access Memory) become impractical and out-of-core implementations are the only viable alternative. RESULTS: We present TeraPCA, a C++ implementation of the Randomized Subspace Iteration method to perform Principal Component Analysis of large-scale datasets. TeraPCA can be applied both in-core and out-of-core and is able to successfully operate even on commodity hardware with a system memory of just a few gigabytes. Moreover, TeraPCA has minimal dependencies on external libraries and only requires a working installation of the BLAS and LAPACK libraries. When applied to a dataset containing a million individuals genotyped on a million markers, TeraPCA requires <5 h (in multi-threaded mode) to accurately compute the 10 leading principal components. An extensive experimental analysis shows that TeraPCA is both fast and accurate and is competitive with current state-of-the-art software for the same task. AVAILABILITY AND IMPLEMENTATION: Source code and documentation are both available at https://github.com/aritra90/TeraPCA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Variação Genética , Genótipo , Humanos , Análise de Componente Principal
5.
iScience ; 27(3): 109209, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38439972

RESUMO

GWAS focuses on significance loosing false positives; machine learning probes sub-significant features relying on predictivity. Yet, these are far from orthogonal. We sought to explore how these inform each other in sub-genome-wide significant situations to define relevance for predictive features. We introduce the SVM-based RubricOE that selects heavily cross-validated feature sets, and LDpred2 PRS as a strong contrast to SVM, to explore significance and predictivity. Our Alzheimer's test case notoriously lacks strong genetic signals except for few very strong phenotype-SNP associations, which suits the problem we are exploring. We found that the most significant SNPs among ML and PRS-selected SNPs captured most of the predictivity, while weaker associations tend also to contribute weakly to predictivity. SNPs with weak associations tend not to contribute to predictivity, but deletion of these features does not injure it. Significance provides a ranking that helps identify weakly predictive features.

6.
Pac Symp Biocomput ; 28: 198-208, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36540977

RESUMO

Polygenic risk scores (PRS) are increasingly used to estimate the personal risk of a trait based on genetics. However, most genomic cohorts are of European populations, with a strong under-representation of non-European groups. Given that PRS poorly transport across racial groups, this has the potential to exacerbate health disparities if used in clinical care. Hence there is a need to generate PRS that perform comparably across ethnic groups. Borrowing from recent advancements in the domain adaption field of machine learning, we propose FairPRS - an Invariant Risk Minimization (IRM) approach for estimating fair PRS or debiasing a pre-computed PRS. We test our method on both a diverse set of synthetic data and real data from the UK Biobank. We show our method can create ancestry-invariant PRS distributions that are both racially unbiased and largely improve phenotype prediction. We hope that FairPRS will contribute to a fairer characterization of patients by genetics rather than by race.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Biologia Computacional , Fatores de Risco , Fenótipo , Herança Multifatorial
7.
Res Comput Mol Biol ; 13278: 86-106, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-36649383

RESUMO

Principal component analysis (PCA) is a widely used dimensionality reduction technique in machine learning and multivariate statistics. To improve the interpretability of PCA, various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis (SPCA). In this paper, we present ThreSPCA, a provably accurate algorithm based on thresholding the Singular Value Decomposition for the SPCA problem, without imposing any restrictive assumptions on the input covariance matrix. Our thresholding algorithm is conceptually simple; much faster than current state-of-the-art; and performs well in practice. When applied to genotype data from the 1000 Genomes Project, ThreSPCA is faster than previous benchmarks, at least as accurate, and leads to a set of interpretable biomarkers, revealing genetic diversity across the world.

8.
AMIA Annu Symp Proc ; 2021: 378-387, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35308982

RESUMO

To date, there have been 180 million confirmed cases of COVID-19, with more than 3.8 million deaths, reported to WHO worldwide. In this paper we address the problem of understanding the host genome's influence, in concert with clinical variables, on the severity of COVID-19 manifestation in the patient. Leveraging positive-unlabeled machine learning algorithms coupled with RubricOE, a state-of-the-art genomic analysis framework, on UK BioBank data we extract novel insights on the complex interplay. The algorithm is also sensitive enough to detect the changing influence of the emergent B.1.1.7 SARS-CoV-2 (alpha) variant on disease severity, and, changing treatment protocols. The genomic component also implicates biological pathways that can help in understanding the disease etiology. Our work demonstrates that it is possible to build a robust and sensitive model despite significant bias, noise and incompleteness in both clinical and genomic data by a careful interleaving of clinical and genomic methodologies.


Assuntos
COVID-19 , SARS-CoV-2 , COVID-19/genética , COVID-19/imunologia , Genômica , Humanos , Aprendizado de Máquina , Índice de Gravidade de Doença
9.
J Family Med Prim Care ; 8(8): 2592-2596, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31548938

RESUMO

CONTEXT: Tooth decay precipitated by poor oral hygiene is one of the most common oral diseases that affect 60-90% of school children. It not only interferes with speech, self-esteem but pain caused by decay also affects nutrition intake, resulting in malnutrition with abnormal cognitive development. AIM: To evaluate the impact of health education and supervised brushing intervention on their oral health status. SETTINGS AND DESIGN: Cross-sectional interventional study. METHODS AND MATERIALS: The study was conducted on students of class 8th, 9th, and10th of an Ashramshala (tribal residential school). All the students present in the school on the day of data collection were included in the study. A semistructured questionnaire was used for data collection. A qualified dentist, who is part of research team, conducted oral examination of the students. They were asked to demonstrate their brushing method and relevant observation was noted. The oral health status of the students was analyzed using DMF (decayed, missed, and filled) index and oral hygiene index- simplified score (OHI-S). Three training and educational sessions of one hour each were conducted separately for each class and a separate session was conducted for the teachers and caretakers of the school. Thereafter, randomly selected students (peers) were asked to demonstrate the technique to their peers to ensure proper understanding. Compliance was ensured through weekly follow ups to the school by the research team. DMF score and OHI-S were recalculated after 3 months and compared with their previous scores. STATISTICAL ANALYSIS: Chi-square test, one-way analysis of variance and paired t-test were used for analysis. RESULTS: The mean DMF and OHI-S score of the students was 2.61+/-2.309 and 2.11+/-0.96, respectively. A significant change (P = 0.021) in OHI scores was observed as a result of intervention. CONCLUSIONS: Promoting healthy dental practice with supportive supervision form the cornerstone for good health and hygiene.

10.
Eur J Hum Genet ; 25(5): 637-645, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28272534

RESUMO

Peloponnese has been one of the cradles of the Classical European civilization and an important contributor to the ancient European history. It has also been the subject of a controversy about the ancestry of its population. In a theory hotly debated by scholars for over 170 years, the German historian Jacob Philipp Fallmerayer proposed that the medieval Peloponneseans were totally extinguished by Slavic and Avar invaders and replaced by Slavic settlers during the 6th century CE. Here we use 2.5 million single-nucleotide polymorphisms to investigate the genetic structure of Peloponnesean populations in a sample of 241 individuals originating from all districts of the peninsula and to examine predictions of the theory of replacement of the medieval Peloponneseans by Slavs. We find considerable heterogeneity of Peloponnesean populations exemplified by genetically distinct subpopulations and by gene flow gradients within Peloponnese. By principal component analysis (PCA) and ADMIXTURE analysis the Peloponneseans are clearly distinguishable from the populations of the Slavic homeland and are very similar to Sicilians and Italians. Using a novel method of quantitative analysis of ADMIXTURE output we find that the Slavic ancestry of Peloponnesean subpopulations ranges from 0.2 to 14.4%. Subpopulations considered by Fallmerayer to be Slavic tribes or to have Near Eastern origin, have no significant ancestry of either. This study rejects the theory of extinction of medieval Peloponneseans and illustrates how genetics can clarify important aspects of the history of a human population.


Assuntos
DNA Antigo/química , Migração Humana , Linhagem , População Branca/genética , Idoso , Idoso de 80 Anos ou mais , Genoma Humano , Genótipo , Grécia , Humanos , Modelos Genéticos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa