Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 89.319
Filter
Add more filters

Coleção CLAP
Publication year range
1.
Cell ; 184(23): 5699-5714.e11, 2021 11 11.
Article in English | MEDLINE | ID: mdl-34735795

ABSTRACT

Extension of the interval between vaccine doses for the BNT162b2 mRNA vaccine was introduced in the United Kingdom to accelerate population coverage with a single dose. At this time, trial data were lacking, and we addressed this in a study of United Kingdom healthcare workers. The first vaccine dose induced protection from infection from the circulating alpha (B.1.1.7) variant over several weeks. In a substudy of 589 individuals, we show that this single dose induces severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) neutralizing antibody (NAb) responses and a sustained B and T cell response to the spike protein. NAb levels were higher after the extended dosing interval (6-14 weeks) compared with the conventional 3- to 4-week regimen, accompanied by enrichment of CD4+ T cells expressing interleukin-2 (IL-2). Prior SARS-CoV-2 infection amplified and accelerated the response. These data on dynamic cellular and humoral responses indicate that extension of the dosing interval is an effective immunogenic protocol.


Subject(s)
COVID-19 Vaccines/immunology , Vaccines, Synthetic/immunology , Adult , Aged , Antibodies, Neutralizing/immunology , Antibodies, Viral/immunology , BNT162 Vaccine , COVID-19/blood , COVID-19/immunology , COVID-19/virology , Cross-Priming/immunology , Dose-Response Relationship, Immunologic , Ethnicity , Female , Humans , Immunity , Immunoglobulin G/immunology , Linear Models , Male , Middle Aged , Reference Standards , SARS-CoV-2/immunology , T-Lymphocytes/immunology , Treatment Outcome , Young Adult , mRNA Vaccines
2.
Cell ; 183(7): 1986-2002.e26, 2020 12 23.
Article in English | MEDLINE | ID: mdl-33333022

ABSTRACT

Serotonin plays a central role in cognition and is the target of most pharmaceuticals for psychiatric disorders. Existing drugs have limited efficacy; creation of improved versions will require better understanding of serotonergic circuitry, which has been hampered by our inability to monitor serotonin release and transport with high spatial and temporal resolution. We developed and applied a binding-pocket redesign strategy, guided by machine learning, to create a high-performance, soluble, fluorescent serotonin sensor (iSeroSnFR), enabling optical detection of millisecond-scale serotonin transients. We demonstrate that iSeroSnFR can be used to detect serotonin release in freely behaving mice during fear conditioning, social interaction, and sleep/wake transitions. We also developed a robust assay of serotonin transporter function and modulation by drugs. We expect that both machine-learning-guided binding-pocket redesign and iSeroSnFR will have broad utility for the development of other sensors and in vitro and in vivo serotonin detection, respectively.


Subject(s)
Directed Molecular Evolution , Machine Learning , Serotonin/metabolism , Algorithms , Amino Acid Sequence , Amygdala/physiology , Animals , Behavior, Animal , Binding Sites , Brain/metabolism , HEK293 Cells , Humans , Kinetics , Linear Models , Mice , Mice, Inbred C57BL , Photons , Protein Binding , Serotonin Plasma Membrane Transport Proteins/metabolism , Sleep/physiology , Wakefulness/physiology
3.
Cell ; 164(1-2): 293-309, 2016 Jan 14.
Article in English | MEDLINE | ID: mdl-26771497

ABSTRACT

Large-scale genomic studies have identified multiple somatic aberrations in breast cancer, including copy number alterations and point mutations. Still, identifying causal variants and emergent vulnerabilities that arise as a consequence of genetic alterations remain major challenges. We performed whole-genome small hairpin RNA (shRNA) "dropout screens" on 77 breast cancer cell lines. Using a hierarchical linear regression algorithm to score our screen results and integrate them with accompanying detailed genetic and proteomic information, we identify vulnerabilities in breast cancer, including candidate "drivers," and reveal general functional genomic properties of cancer cells. Comparisons of gene essentiality with drug sensitivity data suggest potential resistance mechanisms, effects of existing anti-cancer drugs, and opportunities for combination therapy. Finally, we demonstrate the utility of this large dataset by identifying BRD4 as a potential target in luminal breast cancer and PIK3CA mutations as a resistance determinant for BET-inhibitors.


Subject(s)
Algorithms , Breast Neoplasms/genetics , Breast Neoplasms/drug therapy , Breast Neoplasms/pathology , Cell Cycle Proteins , Cell Line, Tumor , Class I Phosphatidylinositol 3-Kinases , Cluster Analysis , Drug Resistance, Neoplasm , Gene Dosage , Gene Expression Profiling , Genome-Wide Association Study , Humans , Linear Models , Nuclear Proteins/genetics , Phosphatidylinositol 3-Kinases , Transcription Factors/genetics
4.
Mol Cell ; 80(2): 359-373.e8, 2020 10 15.
Article in English | MEDLINE | ID: mdl-32991830

ABSTRACT

Eukaryotic gene expression regulation involves thousands of distal regulatory elements. Understanding the quantitative contribution of individual enhancers to gene expression is critical for assessing the role of disease-associated genetic risk variants. Yet, we lack the ability to accurately link genes with their distal regulatory elements. To address this, we used 3D enhancer-promoter (E-P) associations identified using split-pool recognition of interactions by tag extension (SPRITE) to build a predictive model of gene expression. Our model dramatically outperforms models using genomic proximity and can be used to determine the quantitative impact of enhancer loss on gene expression in different genetic backgrounds. We show that genes that form stable E-P hubs have less cell-to-cell variability in gene expression. Finally, we identified transcription factors that regulate stimulation-dependent E-P interactions. Together, our results provide a framework for understanding quantitative contributions of E-P interactions and associated genetic variants to gene expression.


Subject(s)
Bacteria/isolation & purification , Enhancer Elements, Genetic , Promoter Regions, Genetic , Animals , Dendritic Cells/metabolism , Female , Gene Expression Regulation , Linear Models , Mice, Inbred C57BL , Models, Biological , Stochastic Processes , Transcription Factors/metabolism
5.
Nature ; 592(7855): 571-576, 2021 04.
Article in English | MEDLINE | ID: mdl-33790468

ABSTRACT

Biological invasions are responsible for substantial biodiversity declines as well as high economic losses to society and monetary expenditures associated with the management of these invasions1,2. The InvaCost database has enabled the generation of a reliable, comprehensive, standardized and easily updatable synthesis of the monetary costs of biological invasions worldwide3. Here we found that the total reported costs of invasions reached a minimum of US$1.288 trillion (2017 US dollars) over the past few decades (1970-2017), with an annual mean cost of US$26.8 billion. Moreover, we estimate that the annual mean cost could reach US$162.7 billion in 2017. These costs remain strongly underestimated and do not show any sign of slowing down, exhibiting a consistent threefold increase per decade. We show that the documented costs are widely distributed and have strong gaps at regional and taxonomic scales, with damage costs being an order of magnitude higher than management expenditures. Research approaches that document the costs of biological invasions need to be further improved. Nonetheless, our findings call for the implementation of consistent management actions and international policy agreements that aim to reduce the burden of invasive alien species.


Subject(s)
Biodiversity , Ecology/economics , Environmental Science/economics , Internationality , Introduced Species/economics , Introduced Species/trends , Animals , Geographic Mapping , Invertebrates , Linear Models , Plants , Vertebrates
6.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38436558

ABSTRACT

Recently, there has been a growing interest in variable selection for causal inference within the context of high-dimensional data. However, when the outcome exhibits a skewed distribution, ensuring the accuracy of variable selection and causal effect estimation might be challenging. Here, we introduce the generalized median adaptive lasso (GMAL) for covariate selection to achieve an accurate estimation of causal effect even when the outcome follows skewed distributions. A distinctive feature of our proposed method is that we utilize a linear median regression model for constructing penalty weights, thereby maintaining the accuracy of variable selection and causal effect estimation even when the outcome presents extremely skewed distributions. Simulation results showed that our proposed method performs comparably to existing methods in variable selection when the outcome follows a symmetric distribution. Besides, the proposed method exhibited obvious superiority over the existing methods when the outcome follows a skewed distribution. Meanwhile, our proposed method consistently outperformed the existing methods in causal estimation, as indicated by smaller root-mean-square error. We also utilized the GMAL method on a deoxyribonucleic acid methylation dataset from the Alzheimer's disease (AD) neuroimaging initiative database to investigate the association between cerebrospinal fluid tau protein levels and the severity of AD.


Subject(s)
Alzheimer Disease , Humans , Alzheimer Disease/genetics , Computer Simulation , Databases, Factual , Linear Models , Protein Processing, Post-Translational
7.
PLoS Genet ; 19(11): e1011022, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37934796

ABSTRACT

Epigenetic researchers often evaluate DNA methylation as a potential mediator of the effect of social/environmental exposures on a health outcome. Modern statistical methods for jointly evaluating many mediators have not been widely adopted. We compare seven methods for high-dimensional mediation analysis with continuous outcomes through both diverse simulations and analysis of DNAm data from a large multi-ethnic cohort in the United States, while providing an R package for their seamless implementation and adoption. Among the considered choices, the best-performing methods for detecting active mediators in simulations are the Bayesian sparse linear mixed model (BSLMM) and high-dimensional mediation analysis (HDMA); while the preferred methods for estimating the global mediation effect are high-dimensional linear mediation analysis (HILMA) and principal component mediation analysis (PCMA). We provide guidelines for epigenetic researchers on choosing the best method in practice and offer suggestions for future methodological development.


Subject(s)
DNA Methylation , Mediation Analysis , Humans , DNA Methylation/genetics , Bayes Theorem , Linear Models , Environmental Exposure
8.
J Neurosci ; 44(14)2024 Apr 03.
Article in English | MEDLINE | ID: mdl-38316565

ABSTRACT

Although we must prioritize the processing of task-relevant information to navigate life, our ability to do so fluctuates across time. Previous work has identified fMRI functional connectivity (FC) networks that predict an individual's ability to sustain attention and vary with attentional state from 1 min to the next. However, traditional dynamic FC approaches typically lack the temporal precision to capture moment-to-moment network fluctuations. Recently, researchers have "unfurled" traditional FC matrices in "edge cofluctuation time series" which measure timepoint-by-timepoint cofluctuations between regions. Here we apply event-based and parametric fMRI analyses to edge time series to capture moment-to-moment fluctuations in networks related to attention. In two independent fMRI datasets examining young adults of both sexes in which participants performed a sustained attention task, we identified a reliable set of edges that rapidly deflects in response to rare task events. Another set of edges varies with continuous fluctuations in attention and overlaps with a previously defined set of edges associated with individual differences in sustained attention. Demonstrating that edge-based analyses are not simply redundant with traditional regions-of-interest-based approaches, up to one-third of reliably deflected edges were not predicted from univariate activity patterns alone. These results reveal the large potential in combining traditional fMRI analyses with edge time series to identify rapid reconfigurations in networks across the brain.


Subject(s)
Attention , Brain , Male , Female , Young Adult , Humans , Linear Models , Brain/diagnostic imaging , Brain/physiology , Attention/physiology , Brain Mapping/methods , Magnetic Resonance Imaging/methods
9.
Genet Epidemiol ; 48(4): 164-189, 2024 06.
Article in English | MEDLINE | ID: mdl-38420714

ABSTRACT

Gene-environment (GxE) interactions play a crucial role in understanding the complex etiology of various traits, but assessing them using observational data can be challenging due to unmeasured confounders for lifestyle and environmental risk factors. Mendelian randomization (MR) has emerged as a valuable method for assessing causal relationships based on observational data. This approach utilizes genetic variants as instrumental variables (IVs) with the aim of providing a valid statistical test and estimation of causal effects in the presence of unmeasured confounders. MR has gained substantial popularity in recent years largely due to the success of genome-wide association studies. Many methods have been developed for MR; however, limited work has been done on evaluating GxE interaction. In this paper, we focus on two primary IV approaches: the two-stage predictor substitution and the two-stage residual inclusion, and extend them to accommodate GxE interaction under both the linear and logistic regression models for continuous and binary outcomes, respectively. Comprehensive simulation study and analytical derivations reveal that resolving the linear regression model is relatively straightforward. In contrast, the logistic regression model presents a considerably more intricate challenge, which demands additional effort.


Subject(s)
Gene-Environment Interaction , Genome-Wide Association Study , Mendelian Randomization Analysis , Humans , Logistic Models , Linear Models , Polymorphism, Single Nucleotide , Models, Genetic , Genetic Variation , Computer Simulation
10.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36545787

ABSTRACT

Genotype-by-environment interaction (GEI or GxE) plays an important role in understanding complex human traits. However, it is usually challenging to detect GEI signals efficiently and accurately while adjusting for population stratification and sample relatedness in large-scale genome-wide association studies (GWAS). Here we propose a fast and powerful linear mixed model-based approach, fastGWA-GE, to test for GEI effect and G + GxE joint effect. Our extensive simulations show that fastGWA-GE outperforms other existing GEI test methods by controlling genomic inflation better, providing larger power and running hundreds to thousands of times faster. We performed a fastGWA-GE analysis of ~7.27 million variants on 452 249 individuals of European ancestry for 13 quantitative traits and five environment variables in the UK Biobank GWAS data and identified 96 significant signals (72 variants across 57 loci) with GEI test P-values < 1 × 10-9, including 27 novel GEI associations, which highlights the effectiveness of fastGWA-GE in GEI signal discovery in large-scale GWAS.


Subject(s)
Gene-Environment Interaction , Genome-Wide Association Study , Humans , Phenotype , Genotype , Linear Models , Polymorphism, Single Nucleotide
11.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36617187

ABSTRACT

Differential abundance analysis (DAA) is one central statistical task in microbiome data analysis. A robust and powerful DAA tool can help identify highly confident microbial candidates for further biological validation. Current microbiome studies frequently generate correlated samples from different microbiome sampling schemes such as spatial and temporal sampling. In the past decade, a number of DAA tools for correlated microbiome data (DAA-c) have been proposed. Disturbingly, different DAA-c tools could sometimes produce quite discordant results. To recommend the best practice to the field, we performed the first comprehensive evaluation of existing DAA-c tools using real data-based simulations. Overall, the linear model-based methods LinDA, MaAsLin2 and LDM are more robust than methods based on generalized linear models. The LinDA method is the only method that maintains reasonable performance in the presence of strong compositional effects.


Subject(s)
Benchmarking , Microbiota , Microbiota/genetics , Linear Models , Databases, Factual , Metagenomics/methods
12.
Brief Bioinform ; 24(3)2023 05 19.
Article in English | MEDLINE | ID: mdl-37141142

ABSTRACT

In genome assembly, scaffolding can obtain more complete and continuous scaffolds. Current scaffolding methods usually adopt one type of read to construct a scaffold graph and then orient and order contigs. However, scaffolding with the strengths of two or more types of reads seems to be a better solution to some tricky problems. Combining the advantages of different types of data is significant for scaffolding. Here, a hybrid scaffolding method (SLHSD) is present that simultaneously leverages the precision of short reads and the length advantage of long reads. Building an optimal scaffold graph is an important foundation for getting scaffolds. SLHSD uses a new algorithm that combines long and short read alignment information to determine whether to add an edge and how to calculate the edge weight in a scaffold graph. In addition, SLHSD develops a strategy to ensure that edges with high confidence can be added to the graph with priority. Then, a linear programming model is used to detect and remove remaining false edges in the graph. We compared SLHSD with other scaffolding methods on five datasets. Experimental results show that SLHSD outperforms other methods. The open-source code of SLHSD is available at https://github.com/luojunwei/SLHSD.


Subject(s)
Algorithms , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Software , Linear Models
13.
PLoS Biol ; 20(2): e3001562, 2022 02.
Article in English | MEDLINE | ID: mdl-35180228

ABSTRACT

The power of language to modify the reader's perception of interpreting biomedical results cannot be underestimated. Misreporting and misinterpretation are pressing problems in randomized controlled trials (RCT) output. This may be partially related to the statistical significance paradigm used in clinical trials centered around a P value below 0.05 cutoff. Strict use of this P value may lead to strategies of clinical researchers to describe their clinical results with P values approaching but not reaching the threshold to be "almost significant." The question is how phrases expressing nonsignificant results have been reported in RCTs over the past 30 years. To this end, we conducted a quantitative analysis of English full texts containing 567,758 RCTs recorded in PubMed between 1990 and 2020 (81.5% of all published RCTs in PubMed). We determined the exact presence of 505 predefined phrases denoting results that approach but do not cross the line of formal statistical significance (P < 0.05). We modeled temporal trends in phrase data with Bayesian linear regression. Evidence for temporal change was obtained through Bayes factor (BF) analysis. In a randomly sampled subset, the associated P values were manually extracted. We identified 61,741 phrases in 49,134 RCTs indicating almost significant results (8.65%; 95% confidence interval (CI): 8.58% to 8.73%). The overall prevalence of these phrases remained stable over time, with the most prevalent phrases being "marginally significant" (in 7,735 RCTs), "all but significant" (7,015), "a nonsignificant trend" (3,442), "failed to reach statistical significance" (2,578), and "a strong trend" (1,700). The strongest evidence for an increased temporal prevalence was found for "a numerical trend," "a positive trend," "an increasing trend," and "nominally significant." In contrast, the phrases "all but significant," "approaches statistical significance," "did not quite reach statistical significance," "difference was apparent," "failed to reach statistical significance," and "not quite significant" decreased over time. In a random sampled subset of 29,000 phrases, the manually identified and corresponding 11,926 P values, 68,1% ranged between 0.05 and 0.15 (CI: 67. to 69.0; median 0.06). Our results show that RCT reports regularly contain specific phrases describing marginally nonsignificant results to report P values close to but above the dominant 0.05 cutoff. The fact that the prevalence of the phrases remained stable over time indicates that this practice of broadly interpreting P values close to a predefined threshold remains prevalent. To enhance responsible and transparent interpretation of RCT results, researchers, clinicians, reviewers, and editors may reduce the focus on formal statistical significance thresholds and stimulate reporting of P values with corresponding effect sizes and CIs and focus on the clinical relevance of the statistical difference found in RCTs.


Subject(s)
PubMed/standards , Publications/standards , Randomized Controlled Trials as Topic/standards , Research Design/standards , Research Report/standards , Bayes Theorem , Bias , Humans , Linear Models , Outcome Assessment, Health Care/methods , Outcome Assessment, Health Care/standards , Outcome Assessment, Health Care/statistics & numerical data , PubMed/statistics & numerical data , Publications/statistics & numerical data , Randomized Controlled Trials as Topic/statistics & numerical data , Reproducibility of Results
14.
PLoS Comput Biol ; 20(4): e1011975, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38669271

ABSTRACT

The brain produces diverse functions, from perceiving sounds to producing arm reaches, through the collective activity of populations of many neurons. Determining if and how the features of these exogenous variables (e.g., sound frequency, reach angle) are reflected in population neural activity is important for understanding how the brain operates. Often, high-dimensional neural population activity is confined to low-dimensional latent spaces. However, many current methods fail to extract latent spaces that are clearly structured by exogenous variables. This has contributed to a debate about whether or not brains should be thought of as dynamical systems or representational systems. Here, we developed a new latent process Bayesian regression framework, the orthogonal stochastic linear mixing model (OSLMM) which introduces an orthogonality constraint amongst time-varying mixture coefficients, and provide Markov chain Monte Carlo inference procedures. We demonstrate superior performance of OSLMM on latent trajectory recovery in synthetic experiments and show superior computational efficiency and prediction performance on several real-world benchmark data sets. We primarily focus on demonstrating the utility of OSLMM in two neural data sets: µECoG recordings from rat auditory cortex during presentation of pure tones and multi-single unit recordings form monkey motor cortex during complex arm reaching. We show that OSLMM achieves superior or comparable predictive accuracy of neural data and decoding of external variables (e.g., reach velocity). Most importantly, in both experimental contexts, we demonstrate that OSLMM latent trajectories directly reflect features of the sounds and reaches, demonstrating that neural dynamics are structured by neural representations. Together, these results demonstrate that OSLMM will be useful for the analysis of diverse, large-scale biological time-series datasets.


Subject(s)
Auditory Cortex , Bayes Theorem , Markov Chains , Models, Neurological , Neurons , Stochastic Processes , Animals , Rats , Auditory Cortex/physiology , Neurons/physiology , Computational Biology , Linear Models , Monte Carlo Method , Computer Simulation
15.
PLoS Comput Biol ; 20(7): e1012142, 2024 Jul.
Article in English | MEDLINE | ID: mdl-39047024

ABSTRACT

Increasing genetic and phenotypic data size is critical for understanding the genetic determinants of diseases. Evidently, establishing practical means for collaboration and data sharing among institutions is a fundamental methodological barrier for performing high-powered studies. As the sample sizes become more heterogeneous, complex statistical approaches, such as generalized linear mixed effects models, must be used to correct for the confounders that may bias results. On another front, due to the privacy concerns around Protected Health Information (PHI), genetic information is restrictively protected by sharing according to regulations such as Health Insurance Portability and Accountability Act (HIPAA). This limits data sharing among institutions and hampers efforts around executing high-powered collaborative studies. Federated approaches are promising to alleviate the issues around privacy and performance, since sensitive data never leaves the local sites. Motivated by these, we developed FedGMMAT, a federated genetic association testing tool that utilizes a federated statistical testing approach for efficient association tests that can correct for confounding fixed and additive polygenic random effects among different collaborating sites. Genetic data is never shared among collaborating sites, and the intermediate statistics are protected by encryption. Using simulated and real datasets, we demonstrate FedGMMAT can achieve the virtually same results as pooled analysis under a privacy-preserving framework with practical resource requirements.


Subject(s)
Information Dissemination , Humans , Linear Models , Information Dissemination/methods , Computational Biology/methods , Software , Genome-Wide Association Study/methods , Genetic Association Studies
16.
Nature ; 568(7751): 221-225, 2019 04.
Article in English | MEDLINE | ID: mdl-30944480

ABSTRACT

The global land and ocean carbon sinks have increased proportionally with increasing carbon dioxide emissions during the past decades1. It is thought that Northern Hemisphere lands make a dominant contribution to the global land carbon sink2-7; however, the long-term trend of the northern land sink remains uncertain. Here, using measurements of the interhemispheric gradient of atmospheric carbon dioxide from 1958 to 2016, we show that the northern land sink remained stable between the 1960s and the late 1980s, then increased by 0.5 ± 0.4 petagrams of carbon per year during the 1990s and by 0.6 ± 0.5 petagrams of carbon per year during the 2000s. The increase of the northern land sink in the 1990s accounts for 65% of the increase in the global land carbon flux during that period. The subsequent increase in the 2000s is larger than the increase in the global land carbon flux, suggesting a coincident decrease of carbon uptake in the Southern Hemisphere. Comparison of our findings with the simulations of an ensemble of terrestrial carbon models5,8 over the same period suggests that the decadal change in the northern land sink between the 1960s and the 1990s can be explained by a combination of increasing concentrations of atmospheric carbon dioxide, climate variability and changes in land cover. However, the increase during the 2000s is underestimated by all models, which suggests the need for improved consideration of changes in drivers such as nitrogen deposition, diffuse light and land-use change. Overall, our findings underscore the importance of Northern Hemispheric land as a carbon sink.


Subject(s)
Carbon Dioxide/analysis , Carbon Dioxide/history , Carbon Sequestration , Geographic Mapping , Geologic Sediments/chemistry , Atmosphere/chemistry , Carbon/chemistry , Carbon Dioxide/chemistry , China , Construction Materials/analysis , Forests , Fossil Fuels/analysis , History, 20th Century , History, 21st Century , Linear Models , Models, Theoretical , Nitrogen/chemistry , Siberia , Uncertainty
17.
Nucleic Acids Res ; 51(8): 3501-3512, 2023 05 08.
Article in English | MEDLINE | ID: mdl-36809800

ABSTRACT

Human diseases and agricultural traits can be predicted by modeling a genetic random polygenic effect in linear mixed models. To estimate variance components and predict random effects of the model efficiently with limited computational resources has always been of primary concern, especially when it involves increasing the genotype data scale in the current genomic era. Here, we thoroughly reviewed the development history of statistical algorithms used in genetic evaluation and theoretically compared their computational complexity and applicability for different data scenarios. Most importantly, we presented a computationally efficient, functionally enriched, multi-platform and user-friendly software package named 'HIBLUP' to address the challenges that are faced currently using big genomic data. Powered by advanced algorithms, elaborate design and efficient programming, HIBLUP computed fastest while using the lowest memory in analyses, and the greater the number of individuals that are genotyped, the greater the computational benefits from HIBLUP. We also demonstrated that HIBLUP is the only tool which can accomplish the analyses for a UK Biobank-scale dataset within 1 h using the proposed efficient 'HE + PCG' strategy. It is foreseeable that HIBLUP will facilitate genetic research for human, plants and animals. The HIBLUP software and user manual can be accessed freely at https://www.hiblup.com.


Both human diseases and agricultural traits can be predicted by incorporating phenotypic observations and a relationship matrix among individuals in a linear mixed model. Due to the great demand for processing massive data of genotyped individuals, the existing algorithms that require several repetitions of inverse computing on increasingly big dense matrices (e.g. the relationship matrix and the coefficient matrix of mixed model equations) have encountered a bottleneck. Here, we presented a software tool named 'HIBLUP' to address the challenges. Powered by our advanced algorithms (e.g. HE + PCG), elaborate design and efficient programming, HIBLUP can successfully avoid the inverse computing for any big matrix and compute fastest under the lowest memory, which makes it very promising for genetic evaluation using big genomic data.


Subject(s)
Genomics , Models, Genetic , Animals , Humans , Algorithms , Genome , Genotype , Linear Models
18.
PLoS Genet ; 18(4): e1010151, 2022 04.
Article in English | MEDLINE | ID: mdl-35442943

ABSTRACT

With the advent of high throughput genetic data, there have been attempts to estimate heritability from genome-wide SNP data on a cohort of distantly related individuals using linear mixed model (LMM). Fitting such an LMM in a large scale cohort study, however, is tremendously challenging due to its high dimensional linear algebraic operations. In this paper, we propose a new method named PredLMM approximating the aforementioned LMM motivated by the concepts of genetic coalescence and Gaussian predictive process. PredLMM has substantially better computational complexity than most of the existing LMM based methods and thus, provides a fast alternative for estimating heritability in large scale cohort studies. Theoretically, we show that under a model of genetic coalescence, the limiting form of our approximation is the celebrated predictive process approximation of large Gaussian process likelihoods that has well-established accuracy standards. We illustrate our approach with extensive simulation studies and use it to estimate the heritability of multiple quantitative traits from the UK Biobank cohort.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Cohort Studies , Genome-Wide Association Study/methods , Humans , Linear Models , Normal Distribution , Phenotype , Polymorphism, Single Nucleotide/genetics
19.
Proc Natl Acad Sci U S A ; 119(39): e2212959119, 2022 09 27.
Article in English | MEDLINE | ID: mdl-36122202

ABSTRACT

Detecting genetic variants associated with the variance of complex traits, that is, variance quantitative trait loci (vQTLs), can provide crucial insights into the interplay between genes and environments and how they jointly shape human phenotypes in the population. We propose a quantile integral linear model (QUAIL) to estimate genetic effects on trait variability. Through extensive simulations and analyses of real data, we demonstrate that QUAIL provides computationally efficient and statistically powerful vQTL mapping that is robust to non-Gaussian phenotypes and confounding effects on phenotypic variability. Applied to UK Biobank (n = 375,791), QUAIL identified 11 vQTLs for body mass index (BMI) that have not been previously reported. Top vQTL findings showed substantial enrichment for interactions with physical activities and sedentary behavior. Furthermore, variance polygenic scores (vPGSs) based on QUAIL effect estimates showed superior predictive performance on both population-level and within-individual BMI variability compared to existing approaches. Overall, QUAIL is a unified framework to quantify genetic effects on the phenotypic variability at both single-variant and vPGS levels. It addresses critical limitations in existing approaches and may have broad applications in future gene-environment interaction studies.


Subject(s)
Biological Variation, Population , Models, Biological , Phenotype , Biological Variation, Population/genetics , Computer Simulation , Gene-Environment Interaction , Humans , Linear Models , Quantitative Trait Loci
20.
J Infect Dis ; 229(Supplement_1): S25-S33, 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-37249267

ABSTRACT

BACKGROUND: Previous studies reported inconsistent findings regarding the association between respiratory syncytial virus (RSV) subgroup distribution and timing of RSV season. We aimed to further understand the association by conducting a global-level systematic analysis. METHODS: We compiled published data on RSV seasonality through a systematic literature review, and unpublished data shared by international collaborators. Using annual cumulative proportion (ACP) of RSV-positive cases, we defined RSV season onset and offset as ACP reaching 10% and 90%, respectively. Linear regression models accounting for meteorological factors were constructed to analyze the association of proportion of RSV-A with the corresponding RSV season onset and offset. RESULTS: We included 36 study sites from 20 countries, providing data for 179 study-years in 1995-2019. Globally, RSV subgroup distribution was not significantly associated with RSV season onset or offset globally, except for RSV season offset in the tropics in 1 model, possibly by chance. Models that included RSV subgroup distribution and meteorological factors explained only 2%-4% of the variations in timing of RSV season. CONCLUSIONS: Year-on-year variations in RSV season onset and offset are not well explained by RSV subgroup distribution or meteorological factors. Factors including population susceptibility, mobility, and viral interference should be examined in future studies.


Subject(s)
Respiratory Syncytial Virus, Human , Humans , Linear Models , Seasons , Viral Interference
SELECTION OF CITATIONS
SEARCH DETAIL