Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 207
Filtrar
1.
Nature ; 2024 Oct 09.
Artículo en Inglés | MEDLINE | ID: mdl-39385032

RESUMEN

The human hippocampus and prefrontal cortex play critical roles in learning and cognition1,2, yet the dynamic molecular characteristics of their development remain enigmatic. Here we investigated the epigenomic and three-dimensional chromatin conformational reorganization during the development of the hippocampus and prefrontal cortex, using more than 53,000 joint single-nucleus profiles of chromatin conformation and DNA methylation generated by single-nucleus methyl-3C sequencing (snm3C-seq3)3. The remodelling of DNA methylation is temporally separated from chromatin conformation dynamics. Using single-cell profiling and multimodal single-molecule imaging approaches, we have found that short-range chromatin interactions are enriched in neurons, whereas long-range interactions are enriched in glial cells and non-brain tissues. We reconstructed the regulatory programs of cell-type development and differentiation, finding putatively causal common variants for schizophrenia strongly overlapping with chromatin loop-connected, cell-type-specific regulatory regions. Our data provide multimodal resources for studying gene regulatory dynamics in brain development and demonstrate that single-cell three-dimensional multi-omics is a powerful approach for dissecting neuropsychiatric risk loci.

2.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39297879

RESUMEN

Structural variation (SV) refers to insertions, deletions, inversions, and duplications in human genomes. SVs are present in approximately 1.5% of the human genome. Still, this small subset of genetic variation has been implicated in the pathogenesis of psoriasis, Crohn's disease and other autoimmune disorders, autism spectrum and other neurodevelopmental disorders, and schizophrenia. Since identifying structural variants is an important problem in genetics, several specialized computational techniques have been developed to detect structural variants directly from sequencing data. With advances in whole-genome sequencing (WGS) technologies, a plethora of SV detection methods have been developed. However, dissecting SVs from WGS data remains a challenge, with the majority of SV detection methods prone to a high false-positive rate, and no existing method able to precisely detect a full range of SVs present in a sample. Previous studies have shown that none of the existing SV callers can maintain high accuracy across various SV lengths and genomic coverages. Here, we report an integrated structural variant calling framework, Variant Identification and Structural Variant Analysis (VISTA), that leverages the results of individual callers using a novel and robust filtering and merging algorithm. In contrast to existing consensus-based tools which ignore the length and coverage, VISTA overcomes this limitation by executing various combinations of top-performing callers based on variant length and genomic coverage to generate SV events with high accuracy. We evaluated the performance of VISTA on comprehensive gold-standard datasets across varying organisms and coverage. We benchmarked VISTA using the Genome-in-a-Bottle gold standard SV set, haplotype-resolved de novo assemblies from the Human Pangenome Reference Consortium, along with an in-house polymerase chain reaction (PCR)-validated mouse gold standard set. VISTA maintained the highest F1 score among top consensus-based tools measured using a comprehensive gold standard across both mouse and human genomes. VISTA also has an optimized mode, where the calls can be optimized for precision or recall. VISTA-optimized can attain 100% precision and the highest sensitivity among other variant callers. In conclusion, VISTA represents a significant advancement in structural variant calling, offering a robust and accurate framework that outperforms existing consensus-based tools and sets a new standard for SV detection in genomic research.


Asunto(s)
Genoma Humano , Variación Estructural del Genoma , Programas Informáticos , Humanos , Secuenciación Completa del Genoma/métodos , Algoritmos , Genómica/métodos , Biología Computacional/métodos , Variación Genética
3.
Nat Protoc ; 19(9): 2529-2539, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38565959

RESUMEN

Methods for analyzing the full complement of a biomolecule type, e.g., proteomics or metabolomics, generate large amounts of complex data. The software tools used to analyze omics data have reshaped the landscape of modern biology and become an essential component of biomedical research. These tools are themselves quite complex and often require the installation of other supporting software, libraries and/or databases. A researcher may also be using multiple different tools that require different versions of the same supporting materials. The increasing dependence of biomedical scientists on these powerful tools creates a need for easier installation and greater usability. Packaging and containerization are different approaches to satisfy this need by delivering omics tools already wrapped in additional software that makes the tools easier to install and use. In this systematic review, we describe and compare the features of prominent packaging and containerization platforms. We outline the challenges, advantages and limitations of each approach and some of the most widely used platforms from the perspectives of users, software developers and system administrators. We also propose principles to make the distribution of omics software more sustainable and robust to increase the reproducibility of biomedical and life science research.


Asunto(s)
Biología Computacional , Programas Informáticos , Biología Computacional/métodos , Humanos , Proteómica/métodos
4.
ESC Heart Fail ; 11(5): 2490-2498, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-38637959

RESUMEN

Existing risk prediction models for hospitalized heart failure patients are limited. We identified patients hospitalized with a diagnosis of heart failure between 7 May 2013 and 26 April 2022 from a large academic, quaternary care medical centre (training cohort). Demographics, medical comorbidities, vitals, and labs were collected and were used to construct random forest machine learning models to predict in-hospital mortality. Models were compared with logistic regression, and to commonly used heart failure risk scores. The models were subsequently validated in patients hospitalized with a diagnosis of heart failure from a second academic, community medical centre (validation cohort). The entire cohort comprised 21 802 patients, of which 14 539 were in the training cohort and 7263 were in the validation cohort. The median age (25th-75th percentile) was 70 (58-82) for the entire cohort, 43.2% were female, and 6.7% experienced inpatient mortality. In the overall cohort, 7621 (35.0%) patients had heart failure with reduced ejection fraction (EF ≤ 40%), 1271 (5.8%) had heart failure with mildly reduced EF (EF 41-49%), and 12 910 (59.2%) had heart failure with preserved EF (EF ≥ 50%). Random forest models in the validation cohort demonstrated a c-statistic (95% confidence interval) of 0.96 (0.95-0.97), sensitivity (SN) of 87.3%, and specificity (SP) of 90.6% for the prediction of in-hospital mortality. Models for those with HFrEF demonstrated a c-statistic of 0.96 (0.94-0.98), SN 88.2%, and SP 91.0%, and those for patients with HFpEF showed a c-statistic of 0.95 (0.93-0.97), SN 87.4%, and SP 89.5% for predicting in-hospital mortality. The random forest model significantly outperformed logistic regression (c-statistic 0.87, SN 75.9%, and SP 86.9%), and current existing risk scores including the Acute Decompensated Heart Failure National Registry risk score (c-statistic of 0.70, SN 69%, and SP 62%), and the Get With the Guidelines-Heart Failure risk score (c-statistic 0.69, SN 67%, and SP 63%); P < 0.001 for comparison. Machine learning models built from commonly recorded patient information can accurately predict in-hospital mortality among patients hospitalized with a diagnosis of heart failure.


Asunto(s)
Insuficiencia Cardíaca , Mortalidad Hospitalaria , Aprendizaje Automático , Humanos , Insuficiencia Cardíaca/mortalidad , Insuficiencia Cardíaca/diagnóstico , Femenino , Masculino , Mortalidad Hospitalaria/tendencias , Anciano , Persona de Mediana Edad , Medición de Riesgo/métodos , Anciano de 80 o más Años , Estudios Retrospectivos , Pronóstico , Volumen Sistólico/fisiología , Hospitalización/estadística & datos numéricos , Factores de Riesgo , Tasa de Supervivencia/tendencias
5.
medRxiv ; 2024 Feb 06.
Artículo en Inglés | MEDLINE | ID: mdl-38370677

RESUMEN

Background: Previous studies have established a strong link between late-onset epilepsy (LOE) and Alzheimer's disease (AD). However, their shared genetic risk beyond the APOE gene remains unclear. Our study sought to examine the shared genetic factors of AD and LOE, interpret the biological pathways involved, and evaluate how AD onset may be mediated by LOE and shared genetic risks. Methods: We defined phenotypes using phecodes mapped from diagnosis codes, with patients' records aged 60-90. A two-step Least Absolute Shrinkage and Selection Operator (LASSO) workflow was used to identify shared genetic variants based on prior AD GWAS integrated with functional genomic data. We calculated an AD-LOE shared risk score and used it as a proxy in a causal mediation analysis. We used electronic health records from an academic health center (UCLA Health) for discovery analyses and validated our findings in a multi-institutional EHR database (All of Us). Results: The two-step LASSO method identified 34 shared genetic loci between AD and LOE, including the APOE region. These loci were mapped to 65 genes, which showed enrichment in molecular functions and pathways such as tau protein binding and lipoprotein metabolism. Individuals with high predicted shared risk scores have a higher risk of developing AD, LOE, or both in their later life compared to those with low-risk scores. LOE partially mediates the effect of AD-LOE shared genetic risk on AD (15% proportion mediated on average). Validation results from All of Us were consistent with findings from the UCLA sample. Conclusions: We employed a machine learning approach to identify shared genetic risks of AD and LOE. In addition to providing substantial evidence for the significant contribution of the APOE-TOMM40-APOC1 gene cluster to shared risk, we uncovered novel genes that may contribute. Our study is one of the first to utilize All of Us genetic data to investigate AD, and provides valuable insights into the potential common and disease-specific mechanisms underlying AD and LOE, which could have profound implications for the future of disease prevention and the development of targeted treatment strategies to combat the co-occurrence of these two diseases.

6.
Cell Rep ; 42(8): 112856, 2023 08 29.
Artículo en Inglés | MEDLINE | ID: mdl-37481717

RESUMEN

To identify addiction genes, we evaluate intravenous self-administration of cocaine or saline in 84 inbred and recombinant inbred mouse strains over 10 days. We integrate the behavior data with brain RNA-seq data from 41 strains. The self-administration of cocaine and that of saline are genetically distinct. We maximize power to map loci for cocaine intake by using a linear mixed model to account for this longitudinal phenotype while correcting for population structure. A total of 15 unique significant loci are identified in the genome-wide association study. A transcriptome-wide association study highlights the Trpv2 ion channel as a key locus for cocaine self-administration as well as identifying 17 additional genes, including Arhgef26, Slc18b1, and Slco5a1. We find numerous instances where alternate splice site selection or RNA editing altered transcript abundance. Our work emphasizes the importance of Trpv2, an ionotropic cannabinoid receptor, for the response to cocaine.


Asunto(s)
Trastornos Relacionados con Cocaína , Cocaína , Ratones , Animales , Cocaína/farmacología , Estudio de Asociación del Genoma Completo , Encéfalo , Administración Intravenosa , Ratones Endogámicos C57BL
7.
Genome Res ; 33(7): 1032-1041, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37197991

RESUMEN

Mendelian randomization (MR) has emerged as a powerful approach to leverage genetic instruments to infer causality between pairs of traits in observational studies. However, the results of such studies are susceptible to biases owing to weak instruments, as well as the confounding effects of population stratification and horizontal pleiotropy. Here, we show that family data can be leveraged to design MR tests that are provably robust to confounding from population stratification, assortative mating, and dynastic effects. We show in simulations that our approach, MR-Twin, is robust to confounding from population stratification and is not affected by weak instrument bias, whereas standard MR methods yield inflated false positive rates. We then conduct an exploratory analysis of MR-Twin and other MR methods applied to 121 trait pairs in the UK Biobank data set. Our results suggest that confounding from population stratification can lead to false positives for existing MR methods, whereas MR-Twin is immune to this type of confounding, and that MR-Twin can help assess whether traditional approaches may be inflated owing to confounding from population stratification.


Asunto(s)
Análisis de la Aleatorización Mendeliana , Reproducción , Sesgo , Estudio de Asociación del Genoma Completo , Análisis de la Aleatorización Mendeliana/métodos , Fenotipo , Humanos
8.
Front Genet ; 14: 997383, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36999049

RESUMEN

RNA sequencing (RNA-seq) has become an exemplary technology in modern biology and clinical science. Its immense popularity is due in large part to the continuous efforts of the bioinformatics community to develop accurate and scalable computational tools to analyze the enormous amounts of transcriptomic data that it produces. RNA-seq analysis enables genes and their corresponding transcripts to be probed for a variety of purposes, such as detecting novel exons or whole transcripts, assessing expression of genes and alternative transcripts, and studying alternative splicing structure. It can be a challenge, however, to obtain meaningful biological signals from raw RNA-seq data because of the enormous scale of the data as well as the inherent limitations of different sequencing technologies, such as amplification bias or biases of library preparation. The need to overcome these technical challenges has pushed the rapid development of novel computational tools, which have evolved and diversified in accordance with technological advancements, leading to the current myriad of RNA-seq tools. These tools, combined with the diverse computational skill sets of biomedical researchers, help to unlock the full potential of RNA-seq. The purpose of this review is to explain basic concepts in the computational analysis of RNA-seq data and define discipline-specific jargon.

9.
Lab Med ; 54(5): 512-518, 2023 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-36810591

RESUMEN

Massive-scale SARS-CoV-2 testing using the SwabSeq diagnostic platform came with quality assurance challenges due to the novelty and scale of sequencing-based testing. The SwabSeq platform relies on accurate mapping between specimen identifiers and molecular barcodes to match a result back to a patient specimen. To identify and mitigate mapping errors, we instituted quality control using placement of negative controls within a rack of patient samples. We designed 2-dimensional paper templates to fit over a 96-position rack of specimens with holes to show the control tube placements. We designed and 3-dimensionally printed plastic templates that fit onto 4 racks of patient specimens and provide accurate indications of the correct control tube placements. The final plastic templates dramatically reduced plate mapping errors from 22.55% in January 2021 to less than 1% after implementation and training in January 2021. We show how 3D printing can be a cost-effective quality assurance tool to mitigate human error in the clinical laboratory.


Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/diagnóstico , Prueba de COVID-19 , Secuenciación de Nucleótidos de Alto Rendimiento , Impresión Tridimensional , Plásticos
10.
bioRxiv ; 2023 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-36711635

RESUMEN

Mendelian Randomization (MR) has emerged as a powerful approach to leverage genetic instruments to infer causality between pairs of traits in observational studies. However, the results of such studies are susceptible to biases due to weak instruments as well as the confounding effects of population stratification and horizontal pleiotropy. Here, we show that family data can be leveraged to design MR tests that are provably robust to confounding from population stratification, assortative mating, and dynastic effects. We demonstrate in simulations that our approach, MR-Twin, is robust to confounding from population stratification and is not affected by weak instrument bias, while standard MR methods yield inflated false positive rates. We applied MR-Twin to 121 trait pairs in the UK Biobank dataset and found that MR-Twin identifies likely causal trait pairs and does not identify trait pairs that are unlikely to be causal. Our results suggest that confounding from population stratification can lead to false positives for existing MR methods, while MR-Twin is immune to this type of confounding.

11.
Br J Ophthalmol ; 107(11): 1722-1729, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36126104

RESUMEN

PURPOSE: To describe an artificial intelligence platform that detects thyroid eye disease (TED). DESIGN: Development of a deep learning model. METHODS: 1944 photographs from a clinical database were used to train a deep learning model. 344 additional images ('test set') were used to calculate performance metrics. Receiver operating characteristic, precision-recall curves and heatmaps were generated. From the test set, 50 images were randomly selected ('survey set') and used to compare model performance with ophthalmologist performance. 222 images obtained from a separate clinical database were used to assess model recall and to quantitate model performance with respect to disease stage and grade. RESULTS: The model achieved test set accuracy of 89.2%, specificity 86.9%, recall 93.4%, precision 79.7% and an F1 score of 86.0%. Heatmaps demonstrated that the model identified pixels corresponding to clinical features of TED. On the survey set, the ensemble model achieved accuracy, specificity, recall, precision and F1 score of 86%, 84%, 89%, 77% and 82%, respectively. 27 ophthalmologists achieved mean performance of 75%, 82%, 63%, 72% and 66%, respectively. On the second test set, the model achieved recall of 91.9%, with higher recall for moderate to severe (98.2%, n=55) and active disease (98.3%, n=60), as compared with mild (86.8%, n=68) or stable disease (85.7%, n=63). CONCLUSIONS: The deep learning classifier is a novel approach to identify TED and is a first step in the development of tools to improve diagnostic accuracy and lower barriers to specialist evaluation.

12.
PLoS Genet ; 18(11): e1010447, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36342933

RESUMEN

We introduce pleiotropic association test (PAT) for joint analysis of multiple traits using genome-wide association study (GWAS) summary statistics. The method utilizes the decomposition of phenotypic covariation into genetic and environmental components to create a likelihood ratio test statistic for each genetic variant. Though PAT does not directly interpret which trait(s) drive the association, a per trait interpretation of the omnibus p-value is provided through an extension to the meta-analysis framework, m-values. In simulations, we show PAT controls the false positive rate, increases statistical power, and is robust to model misspecifications of genetic effect. Additionally, simulations comparing PAT to three multi-trait methods, HIPO, MTAG, and ASSET, show PAT identified 15.3% more omnibus associations over the next best method. When these associations were interpreted on a per trait level using m-values, PAT had 37.5% more true per trait interpretations with a 0.92% false positive assignment rate. When analyzing four traits from the UK Biobank, PAT discovered 22,095 novel variants. Through the m-values interpretation framework, the number of per trait associations for two traits were almost tripled and were nearly doubled for another trait relative to the original single trait GWAS.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Pleiotropía Genética , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Metaanálisis como Asunto
13.
G3 (Bethesda) ; 12(12)2022 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-36250793

RESUMEN

Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex human traits, but only a fraction of variants identified in discovery studies achieve significance in replication studies. Replication in genome-wide association studies has been well-studied in the context of Winner's Curse, which is the inflation of effect size estimates for significant variants due to statistical chance. However, Winner's Curse is often not sufficient to explain lack of replication. Another reason why studies fail to replicate is that there are fundamental differences between the discovery and replication studies. A confounding factor can create the appearance of a significant finding while actually being an artifact that will not replicate in future studies. We propose a statistical framework that utilizes genome-wide association studies and replication studies to jointly model Winner's Curse and study-specific heterogeneity due to confounding factors. We apply this framework to 100 genome-wide association studies from the Human Genome-Wide Association Studies Catalog and observe that there is a large range in the level of estimated confounding. We demonstrate how this framework can be used to distinguish when studies fail to replicate due to statistical noise and when they fail due to confounding.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Humanos , Herencia Multifactorial , Predisposición Genética a la Enfermedad
14.
Brief Bioinform ; 23(4)2022 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-35753701

RESUMEN

Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora of SV detection methods have been developed. Currently, evidence that investigators can use to select appropriate SV detection tools is lacking. In this article, we have evaluated the performance of SV detection tools on mouse and human WGS data using a comprehensive polymerase chain reaction-confirmed gold standard set of SVs and the genome-in-a-bottle variant set, respectively. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of the SV detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance as the SV detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low- and ultralow-pass sequencing data as well as for different deletion length categories.


Asunto(s)
Benchmarking , Genoma Humano , Animales , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Ratones , Secuenciación Completa del Genoma/métodos
15.
PLoS One ; 17(5): e0268861, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35622842

RESUMEN

Recruiting, training and retaining scientists in computational biology is necessary to develop a workforce that can lead the quantitative biology revolution. Yet, African-American/Black, Hispanic/Latinx, Native Americans, and women are severely underrepresented in computational biosciences. We established the UCLA Bruins-in-Genomics Summer Research Program to provide training and research experiences in quantitative biology and bioinformatics to undergraduate students with an emphasis on students from backgrounds underrepresented in computational biology. Program assessment was based on number of applicants, alumni surveys and comparison of post-graduate educational choices for participants and a control group of students who were accepted but declined to participate. We hypothesized that participation in the Bruins-in-Genomics program would increase the likelihood that students would pursue post-graduate education in a related field. Our surveys revealed that 75% of Bruins-in-Genomics Summer participants were enrolled in graduate school. Logistic regression analysis revealed that women who participated in the program were significantly more likely to pursue a Ph.D. than a matched control group (group x woman interaction term of p = 0.005). The Bruins-in-Genomics Summer program represents an example of how a combined didactic-research program structure can make computational biology accessible to a wide range of undergraduates and increase participation in quantitative biosciences.


Asunto(s)
Biología Computacional , Estudiantes , Femenino , Genómica , Humanos , Evaluación de Programas y Proyectos de Salud , Recursos Humanos
16.
BMC Genomics ; 23(1): 260, 2022 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-35379194

RESUMEN

BACKGROUND: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused global disruption of human health and activity. Being able to trace the early outbreak of SARS-CoV-2 within a locality can inform public health measures and provide insights to contain or prevent viral transmission. Investigation of the transmission history requires efficient sequencing methods and analytic strategies, which can be generally useful in the study of viral outbreaks. METHODS: The County of Los Angeles (hereafter, LA County) sustained a large outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). To learn about the transmission history, we carried out surveillance viral genome sequencing to determine 142 viral genomes from unique patients seeking care at the University of California, Los Angeles (UCLA) Health System. 86 of these genomes were from samples collected before April 19, 2020. RESULTS: We found that the early outbreak in LA County, as in other international air travel hubs, was seeded by multiple introductions of strains from Asia and Europe. We identified a USA-specific strain, B.1.43, which was found predominantly in California and Washington State. While samples from LA County carried the ancestral B.1.43 genome, viral genomes from neighboring counties in California and from counties in Washington State carried additional mutations, suggesting a potential origin of B.1.43 in Southern California. We quantified the transmission rate of SARS-CoV-2 over time, and found evidence that the public health measures put in place in LA County to control the virus were effective at preventing transmission, but might have been undermined by the many introductions of SARS-CoV-2 into the region. CONCLUSION: Our work demonstrates that genome sequencing can be a powerful tool for investigating outbreaks and informing the public health response. Our results reinforce the critical need for the USA to have coordinated inter-state responses to the pandemic.


Asunto(s)
COVID-19 , COVID-19/epidemiología , Brotes de Enfermedades , Genómica , Humanos , Los Angeles/epidemiología , SARS-CoV-2/genética
17.
Nat Methods ; 19(4): 429-440, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35396482

RESUMEN

Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.


Asunto(s)
Metagenoma , Metagenómica , Archaea/genética , Metagenómica/métodos , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN , Programas Informáticos
18.
Nat Commun ; 13(1): 1093, 2022 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-35232963

RESUMEN

Mendelian Randomization (MR) studies are threatened by population stratification, batch effects, and horizontal pleiotropy. Although a variety of methods have been proposed to mitigate those problems, residual biases may still remain, leading to highly statistically significant false positives in large databases. Here we describe a suite of sensitivity analysis tools that enables investigators to quantify the robustness of their findings against such validity threats. Specifically, we propose the routine reporting of sensitivity statistics that reveal the minimal strength of violations necessary to explain away the MR results. We further provide intuitive displays of the robustness of the MR estimate to any degree of violation, and formal bounds on the worst-case bias caused by violations multiple times stronger than observed variables. We demonstrate how these tools can aid researchers in distinguishing robust from fragile findings by examining the effect of body mass index on diastolic blood pressure and Townsend deprivation index.


Asunto(s)
Pleiotropía Genética , Análisis de la Aleatorización Mendeliana , Sesgo , Presión Sanguínea/genética , Índice de Masa Corporal , Progresión de la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Análisis de la Aleatorización Mendeliana/métodos
19.
PLoS Genet ; 17(9): e1009733, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34543273

RESUMEN

Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of "fine mapping" methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).


Asunto(s)
Estudio de Asociación del Genoma Completo , Causalidad , Mapeo Cromosómico/métodos , Humanos , Desequilibrio de Ligamiento , Lipoproteínas HDL/genética , Polimorfismo de Nucleótido Simple
20.
Nat Biomed Eng ; 5(7): 657-665, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-34211145

RESUMEN

Frequent and widespread testing of members of the population who are asymptomatic for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is essential for the mitigation of the transmission of the virus. Despite the recent increases in testing capacity, tests based on quantitative polymerase chain reaction (qPCR) assays cannot be easily deployed at the scale required for population-wide screening. Here, we show that next-generation sequencing of pooled samples tagged with sample-specific molecular barcodes enables the testing of thousands of nasal or saliva samples for SARS-CoV-2 RNA in a single run without the need for RNA extraction. The assay, which we named SwabSeq, incorporates a synthetic RNA standard that facilitates end-point quantification and the calling of true negatives, and that reduces the requirements for automation, purification and sample-to-sample normalization. We used SwabSeq to perform 80,000 tests, with an analytical sensitivity and specificity comparable to or better than traditional qPCR tests, in less than two months with turnaround times of less than 24 h. SwabSeq could be rapidly adapted for the detection of other pathogens.


Asunto(s)
ARN Viral/genética , SARS-CoV-2/patogenicidad , Saliva/virología , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , SARS-CoV-2/genética , Sensibilidad y Especificidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...