Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 38(Suppl_2): ii27-ii33, 2022 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-36124792

RESUMO

MOTIVATION: Local ancestry inference (LAI) is the high resolution prediction of ancestry labels along a DNA sequence. LAI is important in the study of human history and migrations, and it is beginning to play a role in precision medicine applications including ancestry-adjusted genome-wide association studies (GWASs) and polygenic risk scores (PRSs). Existing LAI models do not generalize well between species, chromosomes or even ancestry groups, requiring re-training for each different setting. Furthermore, such methods can lack interpretability, which is an important element in each of these applications. RESULTS: We present SALAI-Net, a portable statistical LAI method that can be applied on any set of species and ancestries (species-agnostic), requiring only haplotype data and no other biological parameters. Inspired by identity by descent methods, SALAI-Net estimates population labels for each segment of DNA by performing a reference matching approach, which leads to an interpretable and fast technique. We benchmark our models on whole-genome data of humans and we test these models' ability to generalize to dog breeds when trained on human data. SALAI-Net outperforms previous methods in terms of balanced accuracy, while generalizing between different settings, species and datasets. Moreover, it is up to two orders of magnitude faster and uses considerably less RAM memory than competing methods. AVAILABILITY AND IMPLEMENTATION: We provide an open source implementation and links to publicly available data at github.com/AI-sandbox/SALAI-Net. Data is publicly available as follows: https://www.internationalgenome.org (1000 Genomes), https://www.simonsfoundation.org/simons-genome-diversity-project (Simons Genome Diversity Project), https://www.sanger.ac.uk/resources/downloads/human/hapmap3.html (HapMap), ftp://ngs.sanger.ac.uk/production/hgdp/hgdp_wgs.20190516 (Human Genome Diversity Project) and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA448733 (Canid genomes). SUPPLEMENTARY INFORMATION: Supplementary data are available from Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Animais , Cães , Haplótipos , Humanos
2.
PLoS Comput Biol ; 18(8): e1010301, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-36007005

RESUMO

The estimation of genetic clusters using genomic data has application from genome-wide association studies (GWAS) to demographic history to polygenic risk scores (PRS) and is expected to play an important role in the analyses of increasingly diverse, large-scale cohorts. However, existing methods are computationally-intensive, prohibitively so in the case of nationwide biobanks. Here we explore Archetypal Analysis as an efficient, unsupervised approach for identifying genetic clusters and for associating individuals with them. Such unsupervised approaches help avoid conflating socially constructed ethnic labels with genetic clusters by eliminating the need for exogenous training labels. We show that Archetypal Analysis yields similar cluster structure to existing unsupervised methods such as ADMIXTURE and provides interpretative advantages. More importantly, we show that since Archetypal Analysis can be used with lower-dimensional representations of genetic data, significant reductions in computational time and memory requirements are possible. When Archetypal Analysis is run in such a fashion, it takes several orders of magnitude less compute time than the current standard, ADMIXTURE. Finally, we demonstrate uses ranging across datasets from humans to canids.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Predisposição Genética para Doença , Genética Populacional , Genoma , Genômica/métodos , Humanos , Polimorfismo de Nucleotídeo Único/genética
3.
Pac Symp Biocomput ; 29: 327-340, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38160290

RESUMO

The lack of diversity in genomic datasets, currently skewed towards individuals of European ancestry, presents a challenge in developing inclusive biomedical models. The scarcity of such data is particularly evident in labeled datasets that include genomic data linked to electronic health records. To address this gap, this paper presents PopGenAdapt, a genotype-to-phenotype prediction model which adopts semi-supervised domain adaptation (SSDA) techniques originally proposed for computer vision. PopGenAdapt is designed to leverage the substantial labeled data available from individuals of European ancestry, as well as the limited labeled and the larger amount of unlabeled data from currently underrepresented populations. The method is evaluated in underrepresented populations from Nigeria, Sri Lanka, and Hawaii for the prediction of several disease outcomes. The results suggest a significant improvement in the performance of genotype-to-phenotype models for these populations over state-of-the-art supervised learning methods, setting SSDA as a promising strategy for creating more inclusive machine learning models in biomedical research.Our code is available at https://github.com/AI-sandbox/PopGenAdapt.


Assuntos
Pesquisa Biomédica , Biologia Computacional , Humanos , Registros Eletrônicos de Saúde , Fenótipo , Genótipo , Aprendizado de Máquina Supervisionado
4.
Nat Commun ; 13(1): 5107, 2022 08 30.
Artigo em Inglês | MEDLINE | ID: mdl-36042219

RESUMO

The SARS-CoV-2 pandemic has differentially impacted populations across race and ethnicity. A multi-omic approach represents a powerful tool to examine risk across multi-ancestry genomes. We leverage a pandemic tracking strategy in which we sequence viral and host genomes and transcriptomes from nasopharyngeal swabs of 1049 individuals (736 SARS-CoV-2 positive and 313 SARS-CoV-2 negative) and integrate them with digital phenotypes from electronic health records from a diverse catchment area in Northern California. Genome-wide association disaggregated by admixture mapping reveals novel COVID-19-severity-associated regions containing previously reported markers of neurologic, pulmonary and viral disease susceptibility. Phylodynamic tracking of consensus viral genomes reveals no association with disease severity or inferred ancestry. Summary data from multiomic investigation reveals metagenomic and HLA associations with severe COVID-19. The wealth of data available from residual nasopharyngeal swabs in combination with clinical data abstracted automatically at scale highlights a powerful strategy for pandemic tracking, and reveals distinct epidemiologic, genetic, and biological associations for those at the highest risk.


Assuntos
COVID-19 , Pandemias , COVID-19/epidemiologia , Genoma Viral , Estudo de Associação Genômica Ampla , Humanos , SARS-CoV-2/genética
5.
J Med Imaging (Bellingham) ; 7(4): 044003, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32904135

RESUMO

Purpose: Fluorescence microscopy visualizes three-dimensional subcellular structures in tissue with two-photon microscopy achieving deeper penetration into tissue. Nuclei detection, which is essential for analyzing tissue for clinical and research purposes, remains a challenging problem due to the spatial variability of nuclei. Recent advancements in deep learning techniques have enabled the analysis of fluorescence microscopy data to localize and segment nuclei. However, these localization or segmentation techniques would require additional steps to extract characteristics of nuclei. We develop a 3D convolutional neural network, called Sphere Estimation Network (SphEsNet), to extract characteristics of nuclei without any postprocessing steps. Approach: To simultaneously estimate the center locations of nuclei and their sizes, SphEsNet is composed of two branches to localize nuclei center coordinates and to estimate their radii. Synthetic microscopy volumes automatically generated using a spatially constrained cycle-consistent adversarial network are used for training the network because manually generating 3D real ground truth volumes would be extremely tedious. Results: Three SphEsNet models based on the size of nuclei were trained and tested on five real fluorescence microscopy data sets from rat kidney and mouse intestine. Our method can successfully detect nuclei in multiple locations with various sizes. In addition, our method was compared with other techniques and outperformed them based on object-level precision, recall, and F 1 score. Our model achieved 89.90% for F 1 score. Conclusions: SphEsNet can simultaneously localize nuclei and estimate their size without additional steps. SphEsNet can be potentially used to extract more information from nuclei in fluorescence microscopy images.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA