Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
Commun Biol ; 7(1): 400, 2024 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-38565955

RESUMEN

Unlocking the full dimensionality of single-cell RNA sequencing data (scRNAseq) is the next frontier to a richer, fuller understanding of cell biology. We introduce q-diffusion, a framework for capturing the coexpression structure of an entire library of genes, improving on state-of-the-art analysis tools. The method is demonstrated via three case studies. In the first, q-diffusion helps gain statistical significance for differential effects on patient outcomes when analyzing the CALGB/SWOG 80405 randomized phase III clinical trial, suggesting precision guidance for the treatment of metastatic colorectal cancer. Secondly, q-diffusion is benchmarked against existing scRNAseq classification methods using an in vitro PBMC dataset, in which the proposed method discriminates IFN-γ stimulation more accurately. The same case study demonstrates improvements in unsupervised cell clustering with the recent Tabula Sapiens human atlas. Finally, a local distributional segmentation approach for spatial scRNAseq, driven by q-diffusion, yields interpretable structures of human cortical tissue.


Asunto(s)
Leucocitos Mononucleares , Análisis de la Célula Individual , Humanos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados
2.
Front Neuroinform ; 17: 1215261, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37720825

RESUMEN

Introduction: Open science initiatives have enabled sharing of large amounts of already collected data. However, significant gaps remain regarding how to find appropriate data, including underutilized data that exist in the long tail of science. We demonstrate the NeuroBridge prototype and its ability to search PubMed Central full-text papers for information relevant to neuroimaging data collected from schizophrenia and addiction studies. Methods: The NeuroBridge architecture contained the following components: (1) Extensible ontology for modeling study metadata: subject population, imaging techniques, and relevant behavioral, cognitive, or clinical data. Details are described in the companion paper in this special issue; (2) A natural-language based document processor that leveraged pre-trained deep-learning models on a small-sample document corpus to establish efficient representations for each article as a collection of machine-recognized ontological terms; (3) Integrated search using ontology-driven similarity to query PubMed Central and NeuroQuery, which provides fMRI activation maps along with PubMed source articles. Results: The NeuroBridge prototype contains a corpus of 356 papers from 2018 to 2021 describing schizophrenia and addiction neuroimaging studies, of which 186 were annotated with the NeuroBridge ontology. The search portal on the NeuroBridge website https://neurobridges.org/ provides an interactive Query Builder, where the user builds queries by selecting NeuroBridge ontology terms to preserve the ontology tree structure. For each return entry, links to the PubMed abstract as well as to the PMC full-text article, if available, are presented. For each of the returned articles, we provide a list of clinical assessments described in the Section "Methods" of the article. Articles returned from NeuroQuery based on the same search are also presented. Conclusion: The NeuroBridge prototype combines ontology-based search with natural-language text-mining approaches to demonstrate that papers relevant to a user's research question can be identified. The NeuroBridge prototype takes a first step toward identifying potential neuroimaging data described in full-text papers. Toward the overall goal of discovering "enough data of the right kind," ongoing work includes validating the document processor with a larger corpus, extending the ontology to include detailed imaging data, and extracting information regarding data availability from the returned publications and incorporating XNAT-based neuroimaging databases to enhance data accessibility.

3.
Front Neuroinform ; 17: 1216443, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37554248

RESUMEN

Background: Despite the efforts of the neuroscience community, there are many published neuroimaging studies with data that are still not findable or accessible. Users face significant challenges in reusing neuroimaging data due to the lack of provenance metadata, such as experimental protocols, study instruments, and details about the study participants, which is also required for interoperability. To implement the FAIR guidelines for neuroimaging data, we have developed an iterative ontology engineering process and used it to create the NeuroBridge ontology. The NeuroBridge ontology is a computable model of provenance terms to implement FAIR principles and together with an international effort to annotate full text articles with ontology terms, the ontology enables users to locate relevant neuroimaging datasets. Methods: Building on our previous work in metadata modeling, and in concert with an initial annotation of a representative corpus, we modeled diagnosis terms (e.g., schizophrenia, alcohol usage disorder), magnetic resonance imaging (MRI) scan types (T1-weighted, task-based, etc.), clinical symptom assessments (PANSS, AUDIT), and a variety of other assessments. We used the feedback of the annotation team to identify missing metadata terms, which were added to the NeuroBridge ontology, and we restructured the ontology to support both the final annotation of the corpus of neuroimaging articles by a second, independent set of annotators, as well as the functionalities of the NeuroBridge search portal for neuroimaging datasets. Results: The NeuroBridge ontology consists of 660 classes with 49 properties with 3,200 axioms. The ontology includes mappings to existing ontologies, enabling the NeuroBridge ontology to be interoperable with other domain specific terminological systems. Using the ontology, we annotated 186 neuroimaging full-text articles describing the participant types, scanning, clinical and cognitive assessments. Conclusion: The NeuroBridge ontology is the first computable metadata model that represents the types of data available in recent neuroimaging studies in schizophrenia and substance use disorders research; it can be extended to include more granular terms as needed. This metadata ontology is expected to form the computational foundation to help both investigators to make their data FAIR compliant and support users to conduct reproducible neuroimaging research.

4.
Pac Symp Biocomput ; 28: 121-132, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36540970

RESUMEN

Groups of distantly related individuals who share a short segment of their genome identical-by-descent (IBD) can provide insights about rare traits and diseases in massive biobanks using IBD mapping. Clustering algorithms play an important role in finding these groups accurately and at scale. We set out to analyze the fitness of commonly used, fast and scalable clustering algorithms for IBD mapping applications. We designed a realistic benchmark for local IBD graphs and utilized it to compare the statistical power of clustering algorithms via simulating 2.3 million clusters across 850 experiments. We found Infomap and Markov Clustering (MCL) community detection methods to have high statistical power in most of the scenarios. They yield a 30% increase in power compared to the current state-of-art approach, with a 3 orders of magnitude lower runtime. We also found that standard clustering metrics, such as modularity, cannot predict statistical power of algorithms in IBD mapping applications. We extend our findings to real datasets by analyzing the Population Architecture using Genomics and Epidemiology (PAGE) Study dataset with 51,000 samples and 2 million shared segments on Chromosome 1, resulting in the extraction of 39 million local IBD clusters. We demonstrate the power of our approach by recovering signals of rare genetic variation in the Whole-Exome Sequence data of 200,000 individuals in the UK Biobank. We provide an efficient implementation to enable clustering at scale for IBD mapping for various populations and scenarios.Supplementary Information: The code, along with supplementary methods and figures are available at https://github.com/roohy/localIBDClustering.


Asunto(s)
Algoritmos , Biología Computacional , Humanos , Genómica , Análisis por Conglomerados
5.
Am J Hum Genet ; 109(4): 669-679, 2022 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-35263625

RESUMEN

One mechanism by which genetic factors influence complex traits and diseases is altering gene expression. Direct measurement of gene expression in relevant tissues is rarely tenable; however, genetically regulated gene expression (GReX) can be estimated using prediction models derived from large multi-omic datasets. These approaches have led to the discovery of many gene-trait associations, but whether models derived from predominantly European ancestry (EA) reference panels can map novel associations in ancestrally diverse populations remains unclear. We applied PrediXcan to impute GReX in 51,520 ancestrally diverse Population Architecture using Genomics and Epidemiology (PAGE) participants (35% African American, 45% Hispanic/Latino, 10% Asian, and 7% Hawaiian) across 25 key cardiometabolic traits and relevant tissues to identify 102 novel associations. We then compared associations in PAGE to those in a random subset of 50,000 White British participants from UK Biobank (UKBB50k) for height and body mass index (BMI). We identified 517 associations across 47 tissues in PAGE but not UKBB50k, demonstrating the importance of diverse samples in identifying trait-associated GReX. We observed that variants used in PrediXcan models were either more or less differentiated across continental-level populations than matched-control variants depending on the specific population reflecting sampling bias. Additionally, variants from identified genes specific to either PAGE or UKBB50k analyses were more ancestrally differentiated than those in genes detected in both analyses, underlining the value of population-specific discoveries. This suggests that while EA-derived transcriptome imputation models can identify new associations in non-EA populations, models derived from closely matched reference panels may yield further insights. Our findings call for more diversity in reference datasets of tissue-specific gene expression.


Asunto(s)
Enfermedades Cardiovasculares , Estudio de Asociación del Genoma Completo , Predisposición Genética a la Enfermedad , Humanos , Estilo de Vida , Polimorfismo de Nucleótido Simple , Transcriptoma
6.
AMIA Annu Symp Proc ; 2022: 1135-1144, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-37128458

RESUMEN

Scientific reproducibility that effectively leverages existing study data is critical to the advancement of research in many disciplines including neuroscience, which uses imaging and electrophysiology modalities as primary endpoints or key dependency in studies. We are developing an integrated search platform called NeuroBridge to enable researchers to search for relevant study datasets that can be used to test a hypothesis or replicate a published finding without having to perform a difficult search from scratch, including contacting individual study authors and locating the site to download the data. In this paper, we describe the development of a metadata ontology based on the World Wide Web Consortium (W3C) PROV specifications to create a corpus of semantically annotated published papers. This annotated corpus was used in a deep learning model to support automated identification of candidate datasets related to neurocognitive assessment of subjects with drug abuse or schizophrenia using neuroimaging. We built on our previous work in the Provenance for Clinical and Health Research (ProvCaRe) project to model metadata information in the NeuroBridge ontology and used this ontology to annotate 51 articles using a Web-based tool called Inception. The Bidirectional Encoder Representations from Transformers (BERT) neural network model, which was trained using the annotated corpus, is used to classify and rank papers relevant to five research hypotheses and the results were evaluated independently by three users for accuracy and recall. Our combined use of the NeuroBridge ontology together with the deep learning model outperforms the existing PubMed Central (PMC) search engine and manifests considerable trainability and transparency compared with typical free-text search. An initial version of the NeuroBridge portal is available at: https://neurobridges.org/.


Asunto(s)
Algoritmos , Aprendizaje Profundo , Humanos , Reproducibilidad de los Resultados , Motor de Búsqueda , PubMed
7.
Sci Rep ; 11(1): 24052, 2021 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-34912034

RESUMEN

Advances in measurement technology are producing increasingly time-resolved environmental exposure data. We aim to gain new insights into exposures and their potential health impacts by moving beyond simple summary statistics (e.g., means, maxima) to characterize more detailed features of high-frequency time series data. This study proposes a novel variant of the Self-Organizing Map (SOM) algorithm called Dynamic Time Warping Self-Organizing Map (DTW-SOM) for unsupervised pattern discovery in time series. This algorithm uses DTW, a similarity measure that optimally aligns interior patterns of sequential data, both as the similarity measure and training guide of the neural network. We applied DTW-SOM to a panel study monitoring indoor and outdoor residential temperature and particulate matter air pollution (PM2.5) for 10 patients with asthma from 7 households near Salt Lake City, UT; the patients were followed for up to 373 days each. Compared to previous SOM algorithms using timestamp alignment on time series data, the DTW-SOM algorithm produced fewer quantization errors and more detailed diurnal patterns. DTW-SOM identified the expected typical diurnal patterns in outdoor temperature which varied by season, as well diurnal patterns in PM2.5 which may be related to daily asthma outcomes. In summary, DTW-SOM is an innovative feature engineering method that can be applied to highly time-resolved environmental exposures assessed by sensors to identify typical diurnal (or hourly or monthly) patterns and provide new insights into the health effects of environmental exposures.


Asunto(s)
Algoritmos , Exposición a Riesgos Ambientales/efectos adversos , Exposición a Riesgos Ambientales/análisis , Evaluación del Impacto en la Salud , Contaminantes Atmosféricos , Contaminación del Aire , Asma/diagnóstico , Asma/epidemiología , Asma/etiología , Monitoreo del Ambiente/métodos , Evaluación del Impacto en la Salud/métodos , Humanos , Redes Neurales de la Computación , Material Particulado , Factores de Tiempo
8.
NPJ Syst Biol Appl ; 7(1): 38, 2021 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-34671039

RESUMEN

Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades1,2, the most dramatic advances in MR have followed in the wake of critical corpus development3. Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet4 was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus.

9.
Sensors (Basel) ; 21(17)2021 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-34502692

RESUMEN

Many approaches to time series classification rely on machine learning methods. However, there is growing interest in going beyond black box prediction models to understand discriminatory features of the time series and their associations with outcomes. One promising method is time-series shapelets (TSS), which identifies maximally discriminative subsequences of time series. For example, in environmental health applications TSS could be used to identify short-term patterns in exposure time series (shapelets) associated with adverse health outcomes. Identification of candidate shapelets in TSS is computationally intensive. The original TSS algorithm used exhaustive search. Subsequent algorithms introduced efficiencies by trimming/aggregating the set of candidates or training candidates from initialized values, but these approaches have limitations. In this paper, we introduce Wavelet-TSS (W-TSS) a novel intelligent method for identifying candidate shapelets in TSS using wavelet transformation discovery. We tested W-TSS on two datasets: (1) a synthetic example used in previous TSS studies and (2) a panel study relating exposures from residential air pollution sensors to symptoms in participants with asthma. Compared to previous TSS algorithms, W-TSS was more computationally efficient, more accurate, and was able to discover more discriminative shapelets. W-TSS does not require pre-specification of shapelet length.


Asunto(s)
Contaminación del Aire , Algoritmos , Humanos , Aprendizaje Automático , Proyectos de Investigación
10.
Nat Commun ; 12(1): 3546, 2021 06 10.
Artículo en Inglés | MEDLINE | ID: mdl-34112768

RESUMEN

The ability to identify segments of genomes identical-by-descent (IBD) is a part of standard workflows in both statistical and population genetics. However, traditional methods for finding local IBD across all pairs of individuals scale poorly leading to a lack of adoption in very large-scale datasets. Here, we present iLASH, an algorithm based on similarity detection techniques that shows equal or improved accuracy in simulations compared to current leading methods and speeds up analysis by several orders of magnitude on genomic datasets, making IBD estimation tractable for millions of individuals. We apply iLASH to the PAGE dataset of ~52,000 multi-ethnic participants, including several founder populations with elevated IBD sharing, identifying IBD segments in ~3 minutes per chromosome compared to over 6 days for a state-of-the-art algorithm. iLASH enables efficient analysis of very large-scale datasets, as we demonstrate by computing IBD across the UK Biobank (~500,000 individuals), detecting 12.9 billion pairwise connections.


Asunto(s)
Genética de Población/métodos , Genómica/métodos , Algoritmos , Simulación por Computador , Bases de Datos Genéticas , Genoma Humano , Haplotipos , Humanos , Linaje , Polimorfismo de Nucleótido Simple , Control de Calidad , Reino Unido/epidemiología , Reino Unido/etnología
11.
Cell ; 184(8): 2068-2083.e11, 2021 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-33861964

RESUMEN

Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these population labels may not adequately capture disease burdens and environmental factors impacting specific sub-populations. Here, we propose a framework for repurposing data from electronic health records (EHRs) in concert with genomic data to explore the demographic ties that can impact disease burdens. Using data from a diverse biobank in New York City, we identified 17 communities sharing recent genetic ancestry. We observed 1,177 health outcomes that were statistically associated with a specific group and demonstrated significant differences in the segregation of genetic variants contributing to Mendelian diseases. We also demonstrated that fine-scale population structure can impact the prediction of complex disease risk within groups. This work reinforces the utility of linking genomic data to EHRs and provides a framework toward fine-scale monitoring of population health.


Asunto(s)
Etnicidad/genética , Salud Poblacional , Bases de Datos Genéticas , Registros Electrónicos de Salud , Genómica , Humanos , Autoinforme
12.
Bioinformatics ; 37(19): 3372-3373, 2021 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-33774671

RESUMEN

SUMMARY: Finding informative predictive features in high-dimensional biological case-control datasets is challenging. The Extreme Pseudo-Sampling (EPS) algorithm offers a solution to the challenge of feature selection via a combination of deep learning and linear regression models. First, using a variational autoencoder, it generates complex latent representations for the samples. Second, it classifies the latent representations of cases and controls via logistic regression. Third, it generates new samples (pseudo-samples) around the extreme cases and controls in the regression model. Finally, it trains a new regression model over the upsampled space. The most significant variables in this regression are selected. We present an open-source implementation of the algorithm that is easy to set up, use and customize. Our package enhances the original algorithm by providing new features and customizability for data preparation, model training and classification functionalities. We believe the new features will enable the adoption of the algorithm for a diverse range of datasets. AVAILABILITY AND IMPLEMENTATION: The software package for Python is available online at https://github.com/roohy/eps. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

13.
IEEE Trans Emerg Top Comput ; 9(1): 316-328, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35548703

RESUMEN

Data science is a field that has developed to enable efficient integration and analysis of increasingly large data sets in many domains. In particular, big data in genetics, neuroimaging, mobile health, and other subfields of biomedical science, promises new insights, but also poses challenges. To address these challenges, the National Institutes of Health launched the Big Data to Knowledge (BD2K) initiative, including a Training Coordinating Center (TCC) tasked with developing a resource for personalized data science training for biomedical researchers. The BD2K TCC web portal is powered by ERuDIte, the Educational Resource Discovery Index, which collects training resources for data science, including online courses, videos of tutorials and research talks, textbooks, and other web-based materials. While the availability of so many potential learning resources is exciting, they are highly heterogeneous in quality, difficulty, format, and topic, making the field intimidating to enter and difficult to navigate. Moreover, data science is rapidly evolving, so there is a constant influx of new materials and concepts. We leverage data science techniques to build ERuDIte itself, using data extraction, data integration, machine learning, information retrieval, and natural language processing to automatically collect, integrate, describe, and organize existing online resources for learning data science.

14.
PLoS Genet ; 16(3): e1008684, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-32226016

RESUMEN

Lipid levels are important markers for the development of cardio-metabolic diseases. Although hundreds of associated loci have been identified through genetic association studies, the contribution of genetic factors to variation in lipids is not fully understood, particularly in U.S. minority groups. We performed genome-wide association analyses for four lipid traits in over 45,000 ancestrally diverse participants from the Population Architecture using Genomics and Epidemiology (PAGE) Study, followed by a meta-analysis with several European ancestry studies. We identified nine novel lipid loci, five of which showed evidence of replication in independent studies. Furthermore, we discovered one novel gene in a PrediXcan analysis, minority-specific independent signals at eight previously reported loci, and potential functional variants at two known loci through fine-mapping. Systematic examination of known lipid loci revealed smaller effect estimates in African American and Hispanic ancestry populations than those in Europeans, and better performance of polygenic risk scores based on minority-specific effect estimates. Our findings provide new insight into the genetic architecture of lipid traits and highlight the importance of conducting genetic studies in diverse populations in the era of precision medicine.


Asunto(s)
Lípidos/sangre , Lípidos/genética , Grupos Raciales/genética , Bases de Datos Genéticas , Femenino , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Humanos , Lípidos/análisis , Masculino , Metagenómica/métodos , Grupos Minoritarios , Herencia Multifactorial/genética , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Estados Unidos/epidemiología
15.
Nature ; 570(7762): 514-518, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31217584

RESUMEN

Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry1-3. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific4-10. Additionally, effect sizes and their derived risk prediction scores derived in one population may not accurately extrapolate to other populations11,12. Here we demonstrate the value of diverse, multi-ethnic participants in large-scale genomic studies. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioural phenotypes in 49,839 non-European individuals. Using strategies tailored for analysis of multi-ethnic and admixed populations, we describe a framework for analysing diverse populations, identify 27 novel loci and 38 secondary signals at known loci, as well as replicate 1,444 GWAS catalogue associations across these traits. Our data show evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts and insights into clinical implications. In the United States-where minority populations have a disproportionately higher burden of chronic conditions13-the lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease. We strongly advocate for continued, large genome-wide efforts in diverse populations to maximize genetic discovery and reduce health disparities.


Asunto(s)
Pueblo Asiatico/genética , Población Negra/genética , Estudio de Asociación del Genoma Completo/métodos , Hispánicos o Latinos/genética , Grupos Minoritarios , Herencia Multifactorial/genética , Salud de la Mujer , Estatura/genética , Estudios de Cohortes , Femenino , Genética Médica/métodos , Equidad en Salud/tendencias , Disparidades en el Estado de Salud , Humanos , Masculino , Estados Unidos
16.
Front Genet ; 10: 494, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31178898

RESUMEN

BACKGROUND: Chronic kidney disease (CKD) is common and disproportionally burdens United States ethnic minorities. Its genetic determinants may differ by disease severity and clinical stages. To uncover genetic factors associated CKD severity among high-risk ethnic groups, we performed genome-wide association studies (GWAS) in diverse populations within the Population Architecture using Genomics and Epidemiology (PAGE) study. METHODS: We assembled multi-ethnic genome-wide imputed data on CKD non-overlapping cases [4,150 mild to moderate CKD, 1,105 end-stage kidney disease (ESKD)] and non-CKD controls for up to 41,041 PAGE participants (African Americans, Hispanics/Latinos, East Asian, Native Hawaiian, and American Indians). We implemented a generalized estimating equation approach for GWAS using ancestry combined data while adjusting for age, sex, principal components, study, and ethnicity. RESULTS: The GWAS identified a novel genome-wide associated locus for mild to moderate CKD nearby NMT2 (rs10906850, p = 3.7 × 10-8) that replicated in the United Kingdom Biobank white British (p = 0.008). Several variants at the APOL1 locus were associated with ESKD including the APOL1 G1 rs73885319 (p = 1.2 × 10-9). There was no overlap among associated loci for CKD and ESKD traits, even at the previously reported APOL1 locus (p = 0.76 for CKD). Several additional loci were associated with CKD or ESKD at p-values below the genome-wide threshold. These loci were often driven by variants more common in non-European ancestry. CONCLUSION: Our genetic study identified a novel association at NMT2 for CKD and showed for the first time strong associations of the APOL1 variants with ESKD across multi-ethnic populations. Our findings suggest differences in genetic effects across CKD severity and provide information for study design of genetic studies of CKD in diverse populations.

17.
JMIR Mhealth Uhealth ; 7(2): e11201, 2019 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-30730297

RESUMEN

BACKGROUND: Time-resolved quantification of physical activity can contribute to both personalized medicine and epidemiological research studies, for example, managing and identifying triggers of asthma exacerbations. A growing number of reportedly accurate machine learning algorithms for human activity recognition (HAR) have been developed using data from wearable devices (eg, smartwatch and smartphone). However, many HAR algorithms depend on fixed-size sampling windows that may poorly adapt to real-world conditions in which activity bouts are of unequal duration. A small sliding window can produce noisy predictions under stable conditions, whereas a large sliding window may miss brief bursts of intense activity. OBJECTIVE: We aimed to create an HAR framework adapted to variable duration activity bouts by (1) detecting the change points of activity bouts in a multivariate time series and (2) predicting activity for each homogeneous window defined by these change points. METHODS: We applied standard fixed-width sliding windows (4-6 different sizes) or greedy Gaussian segmentation (GGS) to identify break points in filtered triaxial accelerometer and gyroscope data. After standard feature engineering, we applied an Xgboost model to predict physical activity within each window and then converted windowed predictions to instantaneous predictions to facilitate comparison across segmentation methods. We applied these methods in 2 datasets: the human activity recognition using smartphones (HARuS) dataset where a total of 30 adults performed activities of approximately equal duration (approximately 20 seconds each) while wearing a waist-worn smartphone, and the Biomedical REAl-Time Health Evaluation for Pediatric Asthma (BREATHE) dataset where a total of 14 children performed 6 activities for approximately 10 min each while wearing a smartwatch. To mimic a real-world scenario, we generated artificial unequal activity bout durations in the BREATHE data by randomly subdividing each activity bout into 10 segments and randomly concatenating the 60 activity bouts. Each dataset was divided into ~90% training and ~10% holdout testing. RESULTS: In the HARuS data, GGS produced the least noisy predictions of 6 physical activities and had the second highest accuracy rate of 91.06% (the highest accuracy rate was 91.79% for the sliding window of size 0.8 second). In the BREATHE data, GGS again produced the least noisy predictions and had the highest accuracy rate of 79.4% of predictions for 6 physical activities. CONCLUSIONS: In a scenario with variable duration activity bouts, GGS multivariate segmentation produced smart-sized windows with more stable predictions and a higher accuracy rate than traditional fixed-size sliding window approaches. Overall, accuracy was good in both datasets but, as expected, it was slightly lower in the more real-world study using wrist-worn smartwatches in children (BREATHE) than in the more tightly controlled study using waist-worn smartphones in adults (HARuS). We implemented GGS in an offline setting, but it could be adapted for real-time prediction with streaming data.


Asunto(s)
Actividades Humanas/psicología , Reconocimiento en Psicología , Dispositivos Electrónicos Vestibles/normas , Acelerometría/métodos , Adulto , Femenino , Actividades Humanas/estadística & datos numéricos , Humanos , Aprendizaje Automático/normas , Aprendizaje Automático/estadística & datos numéricos , Masculino , Persona de Mediana Edad , Análisis Multivariante , Factores de Tiempo , Dispositivos Electrónicos Vestibles/psicología
18.
PLoS One ; 14(12): e0226771, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31891604

RESUMEN

We performed a hypothesis-generating phenome-wide association study (PheWAS) to identify and characterize cross-phenotype associations, where one SNP is associated with two or more phenotypes, between thousands of genetic variants assayed on the Metabochip and hundreds of phenotypes in 5,897 African Americans as part of the Population Architecture using Genomics and Epidemiology (PAGE) I study. The PAGE I study was a National Human Genome Research Institute-funded collaboration of four study sites accessing diverse epidemiologic studies genotyped on the Metabochip, a custom genotyping chip that has dense coverage of regions in the genome previously associated with cardio-metabolic traits and outcomes in mostly European-descent populations. Here we focus on identifying novel phenome-genome relationships, where SNPs are associated with more than one phenotype. To do this, we performed a PheWAS, testing each SNP on the Metabochip for an association with up to 273 phenotypes in the participating PAGE I study sites. We identified 133 putative pleiotropic variants, defined as SNPs associated at an empirically derived p-value threshold of p<0.01 in two or more PAGE study sites for two or more phenotype classes. We further annotated these PheWAS-identified variants using publicly available functional data and local genetic ancestry. Amongst our novel findings is SPARC rs4958487, associated with increased glucose levels and hypertension. SPARC has been implicated in the pathogenesis of diabetes and is also known to have a potential role in fibrosis, a common consequence of multiple conditions including hypertension. The SPARC example and others highlight the potential that PheWAS approaches have in improving our understanding of complex disease architecture by identifying novel relationships between genetic variants and an array of common human phenotypes.


Asunto(s)
Aterosclerosis/genética , Negro o Afroamericano/genética , Pleiotropía Genética , Metagenómica , Fenómica , Anciano , Estudios Epidemiológicos , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple
19.
PLoS One ; 13(7): e0200486, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30044860

RESUMEN

Current knowledge of the genetic architecture of key reproductive events across the female life course is largely based on association studies of European descent women. The relevance of known loci for age at menarche (AAM) and age at natural menopause (ANM) in diverse populations remains unclear. We investigated 32 AAM and 14 ANM previously-identified loci and sought to identify novel loci in a trans-ethnic array-wide study of 196,483 SNPs on the MetaboChip (Illumina, Inc.). A total of 45,364 women of diverse ancestries (African, Hispanic/Latina, Asian American and American Indian/Alaskan Native) in the Population Architecture using Genomics and Epidemiology (PAGE) Study were included in cross-sectional analyses of AAM and ANM. Within each study we conducted a linear regression of SNP associations with self-reported or medical record-derived AAM or ANM (in years), adjusting for birth year, population stratification, and center/region, as appropriate, and meta-analyzed results across studies using multiple meta-analytic techniques. For both AAM and ANM, we observed more directionally consistent associations with the previously reported risk alleles than expected by chance (p-valuesbinomial≤0.01). Eight densely genotyped reproductive loci generalized significantly to at least one non-European population. We identified one trans-ethnic array-wide SNP association with AAM and two significant associations with ANM, which have not been described previously. Additionally, we observed evidence of independent secondary signals at three of six AAM trans-ethnic loci. Our findings support the transferability of reproductive trait loci discovered in European women to women of other race/ethnicities and indicate the presence of additional trans-ethnic associations both at both novel and established loci. These findings suggest the benefit of including diverse populations in future studies of the genetic architecture of female growth and development.


Asunto(s)
Variación Biológica Poblacional/genética , Menarquia/genética , Menopausia/genética , Factores de Edad , Alelos , Variación Biológica Poblacional/etnología , Femenino , Sitios Genéticos/genética , Genotipo , Humanos , Menarquia/etnología , Menopausia/etnología , Fenotipo , Polimorfismo de Nucleótido Simple
20.
Hum Mol Genet ; 27(16): 2940-2953, 2018 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-29878111

RESUMEN

C-reactive protein (CRP) is a circulating biomarker indicative of systemic inflammation. We aimed to evaluate genetic associations with CRP levels among non-European-ancestry populations through discovery, fine-mapping and conditional analyses. A total of 30 503 non-European-ancestry participants from 6 studies participating in the Population Architecture using Genomics and Epidemiology study had serum high-sensitivity CRP measurements and ∼200 000 single nucleotide polymorphisms (SNPs) genotyped on the Metabochip. We evaluated the association between each SNP and log-transformed CRP levels using multivariate linear regression, with additive genetic models adjusted for age, sex, the first four principal components of genetic ancestry, and study-specific factors. Differential linkage disequilibrium patterns between race/ethnicity groups were used to fine-map regions associated with CRP levels. Conditional analyses evaluated for multiple independent signals within genetic regions. One hundred and sixty-three unique variants in 12 loci in overall or race/ethnicity-stratified Metabochip-wide scans reached a Bonferroni-corrected P-value <2.5E-7. Three loci have no (HACL1, OLFML2B) or only limited (PLA2G6) previous associations with CRP levels. Six loci had different top hits in race/ethnicity-specific versus overall analyses. Fine-mapping refined the signal in six loci, particularly in HNF1A. Conditional analyses provided evidence for secondary signals in LEPR, IL1RN and HNF1A, and for multiple independent signals in CRP and APOE. We identified novel variants and loci associated with CRP levels, generalized known CRP associations to a multiethnic study population, refined association signals at several loci and found evidence for multiple independent signals at several well-known loci. This study demonstrates the benefit of conducting inclusive genetic association studies in large multiethnic populations.


Asunto(s)
Proteína C-Reactiva/genética , Estudio de Asociación del Genoma Completo , Metagenómica , Epidemiología Molecular/métodos , Liasas de Carbono-Carbono , Enoil-CoA Hidratasa/genética , Femenino , Glicoproteínas/genética , Fosfolipasas A2 Grupo VI/genética , Humanos , Desequilibrio de Ligamiento , Masculino , Polimorfismo de Nucleótido Simple , Población Blanca/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...