Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
PLoS Genet ; 16(9): e1009018, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32925908

RESUMO

Reverse causality has made it difficult to establish the causal directions between obesity and prediabetes and obesity and insulin resistance. To disentangle whether obesity causally drives prediabetes and insulin resistance already in non-diabetic individuals, we utilized the UK Biobank and METSIM cohort to perform a Mendelian randomization (MR) analyses in the non-diabetic individuals. Our results suggest that both prediabetes and systemic insulin resistance are caused by obesity (p = 1.2×10-3 and p = 3.1×10-24). As obesity reflects the amount of body fat, we next studied how adipose tissue affects insulin resistance. We performed both bulk RNA-sequencing and single nucleus RNA sequencing on frozen human subcutaneous adipose biopsies to assess adipose cell-type heterogeneity and mitochondrial (MT) gene expression in insulin resistance. We discovered that the adipose MT gene expression and body fat percent are both independently associated with insulin resistance (p≤0.05 for each) when adjusting for the decomposed adipose cell-type proportions. Next, we showed that these 3 factors, adipose MT gene expression, body fat percent, and adipose cell types, explain a substantial amount (44.39%) of variance in insulin resistance and can be used to predict it (p≤2.64×10-5 in 3 independent human cohorts). In summary, we demonstrated that obesity is a strong determinant of both prediabetes and insulin resistance, and discovered that individuals' adipose cell-type composition, adipose MT gene expression, and body fat percent predict their insulin resistance, emphasizing the critical role of adipose tissue in systemic insulin resistance.


Assuntos
Tecido Adiposo/metabolismo , Resistência à Insulina/fisiologia , Obesidade/genética , Adipócitos/metabolismo , Adiposidade , Adulto , Índice de Massa Corporal , Estudos de Coortes , Diabetes Mellitus Tipo 2/metabolismo , Feminino , Humanos , Resistência à Insulina/genética , Masculino , Pessoa de Meia-Idade , Obesidade/fisiopatologia , Estado Pré-Diabético/metabolismo , Estado Pré-Diabético/fisiopatologia , Gordura Subcutânea/metabolismo
2.
Nat Methods ; 13(5): 443-5, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27018579

RESUMO

In epigenome-wide association studies (EWAS), different methylation profiles of distinct cell types may lead to false discoveries. We introduce ReFACTor, a method based on principal component analysis (PCA) and designed for the correction of cell type heterogeneity in EWAS. ReFACTor does not require knowledge of cell counts, and it provides improved estimates of cell type composition, resulting in improved power and control for false positives in EWAS. Corresponding software is available at http://www.cs.tau.ac.il/~heran/cozygene/software/refactor.html.


Assuntos
Metilação de DNA/genética , Epigenômica/métodos , Heterogeneidade Genética , Estudo de Associação Genômica Ampla/métodos , Análise de Componente Principal , Algoritmos , Simulação por Computador , Ilhas de CpG/genética , Epigenômica/estatística & dados numéricos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Leucócitos/citologia , Leucócitos/metabolismo
3.
Bioinformatics ; 33(14): i325-i332, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28881982

RESUMO

MOTIVATION: Epigenome-wide association studies can provide novel insights into the regulation of genes involved in traits and diseases. The rapid emergence of bisulfite-sequencing technologies enables performing such genome-wide studies at the resolution of single nucleotides. However, analysis of data produced by bisulfite-sequencing poses statistical challenges owing to low and uneven sequencing depth, as well as the presence of confounding factors. The recently introduced Mixed model Association for Count data via data AUgmentation (MACAU) can address these challenges via a generalized linear mixed model when confounding can be encoded via a single variance component. However, MACAU cannot be used in the presence of multiple variance components. Additionally, MACAU uses a computationally expensive Markov Chain Monte Carlo (MCMC) procedure, which cannot directly approximate the model likelihood. RESULTS: We present a new method, Mixed model Association via a Laplace ApproXimation (MALAX), that is more computationally efficient than MACAU and allows to model multiple variance components. MALAX uses a Laplace approximation rather than MCMC based approximations, which enables to directly approximate the model likelihood. Through an extensive analysis of simulated and real data, we demonstrate that MALAX successfully addresses statistical challenges introduced by bisulfite-sequencing while controlling for complex sources of confounding, and can be over 50% faster than the state of the art. AVAILABILITY AND IMPLEMENTATION: The full source code of MALAX is available at https://github.com/omerwe/MALAX . CONTACT: omerw@cs.technion.ac.il or ehalperin@cs.ucla.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metilação de DNA , Epigenômica/métodos , Análise de Sequência de DNA/métodos , Software , Humanos , Cadeias de Markov , Método de Monte Carlo , Sulfitos
4.
Bioinformatics ; 33(12): 1870-1872, 2017 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-28177067

RESUMO

SUMMARY: GLINT is a user-friendly command-line toolset for fast analysis of genome-wide DNA methylation data generated using the Illumina human methylation arrays. GLINT, which does not require any programming proficiency, allows an easy execution of Epigenome-Wide Association Study analysis pipeline under different models while accounting for known confounders in methylation data. AVAILABILITY AND IMPLEMENTATION: GLINT is a command-line software, freely available at https://github.com/cozygene/glint/releases . It requires Python 2.7 and several freely available Python packages. Further information and documentation as well as a quick start tutorial are available at http://glint-epigenetics.readthedocs.io . CONTACT: elior.rahmani@gmail.com or ehalperin@cs.ucla.edu.


Assuntos
Metilação de DNA , Epigenômica/métodos , Análise de Sequência de DNA/métodos , Software , Genoma Humano , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos
6.
Bioinformatics ; 30(12): i19-25, 2014 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-24931983

RESUMO

MOTIVATION: Gene-gene interactions are of potential biological and medical interest, as they can shed light on both the inheritance mechanism of a trait and on the underlying biological mechanisms. Evidence of epistatic interactions has been reported in both humans and other organisms. Unlike single-locus genome-wide association studies (GWAS), which proved efficient in detecting numerous genetic loci related with various traits, interaction-based GWAS have so far produced very few reproducible discoveries. Such studies introduce a great computational and statistical burden by necessitating a large number of hypotheses to be tested including all pairs of single nucleotide polymorphisms (SNPs). Thus, many software tools have been developed for interaction-based case-control studies, some leading to reliable discoveries. For quantitative data, on the other hand, only a handful of tools exist, and the computational burden is still substantial. RESULTS: We present an efficient algorithm for detecting epistasis in quantitative GWAS, achieving a substantial runtime speedup by avoiding the need to exhaustively test all SNP pairs using metric embedding and random projections. Unlike previous metric embedding methods for case-control studies, we introduce a new embedding, where each SNP is mapped to two Euclidean spaces. We implemented our method in a tool named EPIQ (EPIstasis detection for Quantitative GWAS), and we show by simulations that EPIQ requires hours of processing time where other methods require days and sometimes weeks. Applying our method to a dataset from the Ludwigshafen risk and cardiovascular health study, we discovered a pair of SNPs with a near-significant interaction (P = 2.2 × 10(-13)), in only 1.5 h on 10 processors. AVAILABILITY: https://github.com/yaarasegre/EPIQ


Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Software , Algoritmos , Estudos de Casos e Controles , Humanos , Fenótipo
7.
bioRxiv ; 2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38352303

RESUMO

Polygenic scores (PGSs), increasingly used in clinical settings, frequently include many genetic variants, with performance typically peaking at thousands of variants. Such highly parameterized PGSs often include variants that do not pass a genome-wide significance threshold. We propose a mathematical perspective that renders the effects of many of these non-significant variants random rather than causal, with the randomness capturing population structure. We devise methods to assess variant effect randomness and population stratification bias. Applying these methods to 141 traits from the UK Biobank, we find that, for many PGSs, the effects of non-significant variants are considerably random, with the extent of randomness associated with the degree of overfitting to population structure of the discovery cohort. Our findings explain why highly parameterized PGSs simultaneously have superior cohort-specific performance and limited generalizability, suggesting the critical need for variant randomness tests in PGS evaluation. Supporting code and a dashboard are available at https://github.com/songlab-cal/StratPGS.

8.
Commun Biol ; 7(1): 540, 2024 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-38714798

RESUMO

The genetic influence on human vocal pitch in tonal and non-tonal languages remains largely unknown. In tonal languages, such as Mandarin Chinese, pitch changes differentiate word meanings, whereas in non-tonal languages, such as Icelandic, pitch is used to convey intonation. We addressed this question by searching for genetic associations with interindividual variation in median pitch in a Chinese major depression case-control cohort and compared our results with a genome-wide association study from Iceland. The same genetic variant, rs11046212-T in an intron of the ABCC9 gene, was one of the most strongly associated loci with median pitch in both samples. Our meta-analysis revealed four genome-wide significant hits, including two novel associations. The discovery of genetic variants influencing vocal pitch across both tonal and non-tonal languages suggests the possibility of a common genetic contribution to the human vocal system shared in two distinct populations with languages that differ in tonality (Icelandic and Mandarin).


Assuntos
Estudo de Associação Genômica Ampla , Idioma , Humanos , Masculino , Feminino , Polimorfismo de Nucleotídeo Único , Adulto , Islândia , Estudos de Casos e Controles , Pessoa de Meia-Idade , Voz/fisiologia , Percepção da Altura Sonora , Povo Asiático/genética
9.
Sci Rep ; 14(1): 13034, 2024 06 06.
Artigo em Inglês | MEDLINE | ID: mdl-38844476

RESUMO

The risk of developing age-related macular degeneration (AMD) is influenced by genetic background. In 2016, the International AMD Genomics Consortium (IAMDGC) identified 52 risk variants in 34 loci, and a polygenic risk score (PRS) from these variants was associated with AMD. The Israeli population has a unique genetic composition: Ashkenazi Jewish (AJ), Jewish non-Ashkenazi, and Arab sub-populations. We aimed to perform a genome-wide association study (GWAS) for AMD in Israel, and to evaluate PRSs for AMD. Our discovery set recruited 403 AMD patients and 256 controls at Hadassah Medical Center. We genotyped individuals via custom exome chip. We imputed non-typed variants using cosmopolitan and AJ reference panels. We recruited additional 155 cases and 69 controls for validation. To evaluate predictive power of PRSs for AMD, we used IAMDGC summary-statistics excluding our study and developed PRSs via clumping/thresholding or LDpred2. In our discovery set, 31/34 loci reported by IAMDGC were AMD-associated (P < 0.05). Of those, all effects were directionally consistent with IAMDGC and 11 loci had a P-value under Bonferroni-corrected threshold (0.05/34 = 0.0015). At a 5 × 10-5 threshold, we discovered four suggestive associations in FAM189A1, IGDCC4, C7orf50, and CNTNAP4. Only the FAM189A1 variant was AMD-associated in the replication cohort after Bonferroni-correction. A prediction model including LDpred2-based PRS + covariates had an AUC of 0.82 (95% CI 0.79-0.85) and performed better than covariates-only model (P = 5.1 × 10-9). Therefore, previously reported AMD-associated loci were nominally associated with AMD in Israel. A PRS developed based on a large international study is predictive in Israeli populations.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Degeneração Macular , Polimorfismo de Nucleotídeo Único , Humanos , Degeneração Macular/genética , Degeneração Macular/epidemiologia , Israel/epidemiologia , Feminino , Masculino , Idoso , Fatores de Risco , Pessoa de Meia-Idade , Estudos de Casos e Controles , Idoso de 80 Anos ou mais , Herança Multifatorial/genética , Judeus/genética , Genótipo
10.
bioRxiv ; 2024 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-38260588

RESUMO

The immune system comprises multiple cell lineages and heterogeneous subsets found in blood and tissues throughout the body. While human immune responses differ between sites and over age, the underlying sources of variation remain unclear as most studies are limited to peripheral blood. Here, we took a systems approach to comprehensively profile RNA and surface protein expression of over 1.25 million immune cells isolated from blood, lymphoid organs, and mucosal tissues of 24 organ donors aged 20-75 years. We applied a multimodal classifier to annotate the major immune cell lineages (T cells, B cells, innate lymphoid cells, and myeloid cells) and their corresponding subsets across the body, leveraging probabilistic modeling to define bases for immune variations across donors, tissue, and age. We identified dominant tissue-specific effects on immune cell composition and function across lineages for lymphoid sites, intestines, and blood-rich tissues. Age-associated effects were intrinsic to both lineage and site as manifested by macrophages in mucosal sites, B cells in lymphoid organs, and T and NK cells in blood-rich sites. Our results reveal tissue-specific signatures of immune homeostasis throughout the body and across different ages. This information provides a basis for defining the transcriptional underpinnings of immune variation and potential associations with disease-associated immune pathologies across the human lifespan.

11.
bioRxiv ; 2023 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-36711575

RESUMO

Defining and accounting for subphenotypic structure has the potential to increase statistical power and provide a deeper understanding of the heterogeneity in the molecular basis of complex disease. Existing phenotype subtyping methods primarily rely on clinically observed heterogeneity or metadata clustering. However, they generally tend to capture the dominant sources of variation in the data, which often originate from variation that is not descriptive of the mechanistic heterogeneity of the phenotype of interest; in fact, such dominant sources of variation, such as population structure or technical variation, are, in general, expected to be independent of subphenotypic structure. We instead aim to find a subspace with signal that is unique to a group of samples for which we believe that subphenotypic variation exists (e.g., cases of a disease). To that end, we introduce Phenotype Aware Components Analysis (PACA), a contrastive learning approach leveraging canonical correlation analysis to robustly capture weak sources of subphenotypic variation. In the context of disease, PACA learns a gradient of variation unique to cases in a given dataset, while leveraging control samples for accounting for variation and imbalances of biological and technical confounders between cases and controls. We evaluated PACA using an extensive simulation study, as well as on various subtyping tasks using genotypes, transcriptomics, and DNA methylation data. Our results provide multiple strong evidence that PACA allows us to robustly capture weak unknown variation of interest while being calibrated and well-powered, far superseding the performance of alternative methods. This renders PACA as a state-of-the-art tool for defining de novo subtypes that are more likely to reflect molecular heterogeneity, especially in challenging cases where the phenotypic heterogeneity may be masked by a myriad of strong unrelated effects in the data.

12.
medRxiv ; 2023 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-37732190

RESUMO

Purpose: The risk of developing age-related macular degeneration(AMD) is influenced by genetic background. In 2016, International AMD Genomics Consortium(IAMDGC) identified 52 risk variants in 34 loci, and a polygenic risk score(PRS) based on these variants was associated with AMD. The Israeli population has a unique genetic composition: Ashkenazi Jewish(AJ), Jewish non-Ashkenazi, and Arab sub-populations. We aimed to perform a genome-wide association study(GWAS) for AMD in Israel, and to evaluate PRSs for AMD. Methods: For our discovery set, we recruited 403 AMD patients and 256 controls at Hadassah Medical Center. We genotyped all individuals via custom exome chip. We imputed non-typed variants using cosmopolitan and AJ reference panels. We recruited additional 155 cases and 69 controls for validation. To evaluate predictive power of PRSs for AMD, we used IAMDGC summary statistics excluding our study and developed PRSs via either clumping/thresholding or LDpred2. Results: In our discovery set, 31/34 loci previously reported by the IAMDGC were AMD associated with P<0.05. Of those, all effects were directionally consistent with the IAMDGC and 11 loci had a p-value under Bonferroni-corrected threshold(0.05/34=0.0015). At a threshold of 5x10 -5 , we discovered four suggestive associations in FAM189A1 , IGDCC4 , C7orf50 , and CNTNAP4 . However, only the FAM189A1 variant was AMD associated in the replication cohort after Bonferroni-correction. A prediction model including LDpred2-based PRS and other covariates had an AUC of 0.82(95%CI:0.79-0.85) and performed better than a covariates-only model(P=5.1x10 -9 ). Conclusions: Previously reported AMD-associated loci were nominally associated with AMD in Israel. A PRS developed based on a large international study is predictive in Israeli populations.

13.
Res Sq ; 2023 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-38045283

RESUMO

We present SLIViT, a deep-learning framework that accurately measures disease-related risk factors in volumetric biomedical imaging, such as magnetic resonance imaging (MRI) scans, optical coherence tomography (OCT) scans, and ultrasound videos. To evaluate SLIViT, we applied it to five different datasets of these three different data modalities tackling seven learning tasks (including both classification and regression) and found that it consistently and significantly outperforms domain-specific state-of-the-art models, typically improving performance (ROC AUC or correlation) by 0.1-0.4. Notably, compared to existing approaches, SLIViT can be applied even when only a small number of annotated training samples is available, which is often a constraint in medical applications. When trained on less than 700 annotated volumes, SLIViT obtained accuracy comparable to trained clinical specialists while reducing annotation time by a factor of 5,000 demonstrating its utility to automate and expedite ongoing research and other practical clinical scenarios.

14.
Front Bioinform ; 1: 792605, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-36303752

RESUMO

Calling differential methylation at a cell-type level from tissue-level bulk data is a fundamental challenge in genomics that has recently received more attention. These studies most often aim at identifying statistical associations rather than causal effects. However, existing methods typically make an implicit assumption about the direction of effects, and thus far, little to no attention has been given to the fact that this directionality assumption may not hold and can consequently affect statistical power and control for false positives. We demonstrate that misspecification of the model directionality can lead to a drastic decrease in performance and increase in risk of spurious findings in cell-type-specific differential methylation analysis, and we discuss the need to carefully consider model directionality before choosing a statistical method for analysis.

15.
Invest Ophthalmol Vis Sci ; 61(2): 48, 2020 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-32106291

RESUMO

Purpose: Anti-vascular endothelial growth factor (VEGF) therapy for neovascular AMD (nvAMD) obtains a variable outcome. We performed a genome-wide association study for anti-VEGF treatment response in nvAMD to identify variants potentially underlying such a variable outcome. Methods: Israeli patients with nvAMD who underwent anti-VEGF treatment (n = 187) were genotyped on a whole exome chip containing approximately 500,000 variants. Genotyping was correlated with delta visual acuity (deltaVA) between baseline and after three injections of anti-VEGF. Top principal components, age, and baseline VA were included in the analysis. Two lead associated variants were genotyped in an independent validation set of patients with nvAMD (n = 108). Results: Linear regression analysis on 5,353,842 variants revealed five exonic variants with an association P value of less than 6 × 10-5. The top variant in the gene VWA3A (P = 1.77 × 10-6) was tested in the validation cohort. The minor allele of the VWA3A variant was associated with worse response to treatment (P = 0.02). The average deltaVA of discovery plus validation was -0.214 logMAR (≈ a gain of 10.7 Early Treatment Diabetic Retinopathy Study letters) for homozygote for the major allele, 0.172 logMAR for heterozygotes (≈ a loss of 8.6 Early Treatment Diabetic Retinopathy Study letters), and 0.21 logMAR for homozygote for the minor allele (≈ a loss of 10.5 Early Treatment Diabetic Retinopathy Study letters). Minor allele carriers had a higher frequency of macular hemorrhage at baseline. Conclusions: An VWA3A gene variant was associated with worse response to anti-VEGF treatment in Israeli patients with nvAMD. The VWA3A protein is a precursor of the multimeric von Willebrand factor which is involved in blood coagulation, a system previously associated with nvAMD.


Assuntos
Inibidores da Angiogênese/uso terapêutico , Neovascularização de Coroide , Precursores de Proteínas/genética , Degeneração Macular Exsudativa , Idoso , Idoso de 80 Anos ou mais , Neovascularização de Coroide/tratamento farmacológico , Neovascularização de Coroide/genética , Feminino , Humanos , Israel , Masculino , Pessoa de Meia-Idade , Análise de Regressão , Acuidade Visual , Degeneração Macular Exsudativa/tratamento farmacológico , Degeneração Macular Exsudativa/genética , Fator de von Willebrand/genética
16.
Nat Commun ; 11(1): 1971, 2020 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-32332754

RESUMO

We present Bisque, a tool for estimating cell type proportions in bulk expression. Bisque implements a regression-based approach that utilizes single-cell RNA-seq (scRNA-seq) or single-nucleus RNA-seq (snRNA-seq) data to generate a reference expression profile and learn gene-specific bulk expression transformations to robustly decompose RNA-seq data. These transformations significantly improve decomposition performance compared to existing methods when there is significant technical variation in the generation of the reference profile and observed bulk expression. Importantly, compared to existing methods, our approach is extremely efficient, making it suitable for the analysis of large genomic datasets that are becoming ubiquitous. When applied to subcutaneous adipose and dorsolateral prefrontal cortex expression datasets with both bulk RNA-seq and snRNA-seq data, Bisque replicates previously reported associations between cell type proportions and measured phenotypes across abundant and rare cell types. We further propose an additional mode of operation that merely requires a set of known marker genes.


Assuntos
Biologia Computacional/métodos , RNA-Seq/métodos , Análise de Célula Única/métodos , Tecido Adiposo/metabolismo , Algoritmos , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Genômica , Humanos , Córtex Pré-Frontal/metabolismo , RNA Citoplasmático Pequeno , Software , Transcriptoma
17.
Nat Commun ; 11(1): 2891, 2020 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-32493922

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

18.
Sci Rep ; 10(1): 11019, 2020 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-32620816

RESUMO

Single-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. We observe that snRNA-seq is commonly subject to contamination by high amounts of ambient RNA, which can lead to biased downstream analyses, such as identification of spurious cell types if overlooked. We present a novel approach to quantify contamination and filter droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: (1) human differentiating preadipocytes in vitro, (2) fresh mouse brain tissue, and (3) human frozen adipose tissue (AT) from six individuals. All three data sets showed evidence of extranuclear RNA contamination, and we observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq, our clustering strategy also successfully filtered single-cell RNA-seq data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.


Assuntos
Tecido Adiposo/metabolismo , Encéfalo/metabolismo , Análise de Sequência de RNA/métodos , Animais , Perfilação da Expressão Gênica , Humanos , Funções Verossimilhança , Camundongos , Análise de Célula Única , Aprendizado de Máquina Supervisionado
19.
PLoS One ; 15(9): e0239474, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32960917

RESUMO

Worldwide, testing capacity for SARS-CoV-2 is limited and bottlenecks in the scale up of polymerase chain reaction (PCR-based testing exist. Our aim was to develop and evaluate a machine learning algorithm to diagnose COVID-19 in the inpatient setting. The algorithm was based on basic demographic and laboratory features to serve as a screening tool at hospitals where testing is scarce or unavailable. We used retrospectively collected data from the UCLA Health System in Los Angeles, California. We included all emergency room or inpatient cases receiving SARS-CoV-2 PCR testing who also had a set of ancillary laboratory features (n = 1,455) between 1 March 2020 and 24 May 2020. We tested seven machine learning models and used a combination of those models for the final diagnostic classification. In the test set (n = 392), our combined model had an area under the receiver operator curve of 0.91 (95% confidence interval 0.87-0.96). The model achieved a sensitivity of 0.93 (95% CI 0.85-0.98), specificity of 0.64 (95% CI 0.58-0.69). We found that our machine learning algorithm had excellent diagnostic metrics compared to SARS-CoV-2 PCR. This ensemble machine learning algorithm to diagnose COVID-19 has the potential to be used as a screening tool in hospital settings where PCR testing is scarce or unavailable.


Assuntos
Betacoronavirus , Técnicas de Laboratório Clínico/métodos , Infecções por Coronavirus/diagnóstico , Pacientes Internados , Aprendizado de Máquina , Pneumonia Viral/diagnóstico , Adulto , Idoso , Área Sob a Curva , COVID-19 , Teste para COVID-19 , Técnicas de Laboratório Clínico/normas , Humanos , Los Angeles , Programas de Rastreamento/métodos , Programas de Rastreamento/normas , Pessoa de Meia-Idade , Pandemias , Reação em Cadeia da Polimerase , Estudos Retrospectivos , SARS-CoV-2
20.
Genome Biol ; 20(1): 138, 2019 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-31300005

RESUMO

Methylation datasets are affected by innumerable sources of variability, both biological (cell-type composition, genetics) and technical (batch effects). Here, we propose a reference-free method based on sparse canonical correlation analysis to separate the biological from technical sources of variability. We show through simulations and real data that our method, CONFINED, is not only more accurate than the state-of-the-art reference-free methods for capturing known, replicable biological variability, but it is also considerably more robust to dataset-specific technical variability than previous approaches. CONFINED is available as an R package as detailed at https://github.com/cozygene/CONFINED .


Assuntos
Artefatos , Metilação de DNA , Variação Genética , Software , Conjuntos de Dados como Assunto
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa