Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 90.874
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 184(23): 5699-5714.e11, 2021 11 11.
Artigo em Inglês | MEDLINE | ID: mdl-34735795

RESUMO

Extension of the interval between vaccine doses for the BNT162b2 mRNA vaccine was introduced in the United Kingdom to accelerate population coverage with a single dose. At this time, trial data were lacking, and we addressed this in a study of United Kingdom healthcare workers. The first vaccine dose induced protection from infection from the circulating alpha (B.1.1.7) variant over several weeks. In a substudy of 589 individuals, we show that this single dose induces severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) neutralizing antibody (NAb) responses and a sustained B and T cell response to the spike protein. NAb levels were higher after the extended dosing interval (6-14 weeks) compared with the conventional 3- to 4-week regimen, accompanied by enrichment of CD4+ T cells expressing interleukin-2 (IL-2). Prior SARS-CoV-2 infection amplified and accelerated the response. These data on dynamic cellular and humoral responses indicate that extension of the dosing interval is an effective immunogenic protocol.


Assuntos
Vacinas contra COVID-19/imunologia , Vacinas Sintéticas/imunologia , Adulto , Idoso , Anticorpos Neutralizantes/imunologia , Anticorpos Antivirais/imunologia , Vacina BNT162 , COVID-19/sangue , COVID-19/imunologia , COVID-19/virologia , Apresentação Cruzada/imunologia , Relação Dose-Resposta Imunológica , Etnicidade , Feminino , Humanos , Imunidade , Imunoglobulina G/imunologia , Modelos Lineares , Masculino , Pessoa de Meia-Idade , Padrões de Referência , SARS-CoV-2/imunologia , Linfócitos T/imunologia , Resultado do Tratamento , Adulto Jovem , Vacinas de mRNA
2.
Cell ; 183(7): 1986-2002.e26, 2020 12 23.
Artigo em Inglês | MEDLINE | ID: mdl-33333022

RESUMO

Serotonin plays a central role in cognition and is the target of most pharmaceuticals for psychiatric disorders. Existing drugs have limited efficacy; creation of improved versions will require better understanding of serotonergic circuitry, which has been hampered by our inability to monitor serotonin release and transport with high spatial and temporal resolution. We developed and applied a binding-pocket redesign strategy, guided by machine learning, to create a high-performance, soluble, fluorescent serotonin sensor (iSeroSnFR), enabling optical detection of millisecond-scale serotonin transients. We demonstrate that iSeroSnFR can be used to detect serotonin release in freely behaving mice during fear conditioning, social interaction, and sleep/wake transitions. We also developed a robust assay of serotonin transporter function and modulation by drugs. We expect that both machine-learning-guided binding-pocket redesign and iSeroSnFR will have broad utility for the development of other sensors and in vitro and in vivo serotonin detection, respectively.


Assuntos
Evolução Molecular Direcionada , Aprendizado de Máquina , Serotonina/metabolismo , Algoritmos , Sequência de Aminoácidos , Tonsila do Cerebelo/fisiologia , Animais , Comportamento Animal , Sítios de Ligação , Encéfalo/metabolismo , Células HEK293 , Humanos , Cinética , Modelos Lineares , Camundongos , Camundongos Endogâmicos C57BL , Fótons , Ligação Proteica , Proteínas da Membrana Plasmática de Transporte de Serotonina/metabolismo , Sono/fisiologia , Vigília/fisiologia
3.
Cell ; 164(1-2): 293-309, 2016 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-26771497

RESUMO

Large-scale genomic studies have identified multiple somatic aberrations in breast cancer, including copy number alterations and point mutations. Still, identifying causal variants and emergent vulnerabilities that arise as a consequence of genetic alterations remain major challenges. We performed whole-genome small hairpin RNA (shRNA) "dropout screens" on 77 breast cancer cell lines. Using a hierarchical linear regression algorithm to score our screen results and integrate them with accompanying detailed genetic and proteomic information, we identify vulnerabilities in breast cancer, including candidate "drivers," and reveal general functional genomic properties of cancer cells. Comparisons of gene essentiality with drug sensitivity data suggest potential resistance mechanisms, effects of existing anti-cancer drugs, and opportunities for combination therapy. Finally, we demonstrate the utility of this large dataset by identifying BRD4 as a potential target in luminal breast cancer and PIK3CA mutations as a resistance determinant for BET-inhibitors.


Assuntos
Algoritmos , Neoplasias da Mama/genética , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/patologia , Proteínas de Ciclo Celular , Linhagem Celular Tumoral , Classe I de Fosfatidilinositol 3-Quinases , Análise por Conglomerados , Resistencia a Medicamentos Antineoplásicos , Dosagem de Genes , Perfilação da Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Modelos Lineares , Proteínas Nucleares/genética , Fosfatidilinositol 3-Quinases , Fatores de Transcrição/genética
4.
Mol Cell ; 80(2): 359-373.e8, 2020 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-32991830

RESUMO

Eukaryotic gene expression regulation involves thousands of distal regulatory elements. Understanding the quantitative contribution of individual enhancers to gene expression is critical for assessing the role of disease-associated genetic risk variants. Yet, we lack the ability to accurately link genes with their distal regulatory elements. To address this, we used 3D enhancer-promoter (E-P) associations identified using split-pool recognition of interactions by tag extension (SPRITE) to build a predictive model of gene expression. Our model dramatically outperforms models using genomic proximity and can be used to determine the quantitative impact of enhancer loss on gene expression in different genetic backgrounds. We show that genes that form stable E-P hubs have less cell-to-cell variability in gene expression. Finally, we identified transcription factors that regulate stimulation-dependent E-P interactions. Together, our results provide a framework for understanding quantitative contributions of E-P interactions and associated genetic variants to gene expression.


Assuntos
Bactérias/isolamento & purificação , Elementos Facilitadores Genéticos , Regiões Promotoras Genéticas , Animais , Células Dendríticas/metabolismo , Feminino , Regulação da Expressão Gênica , Modelos Lineares , Camundongos Endogâmicos C57BL , Modelos Biológicos , Processos Estocásticos , Fatores de Transcrição/metabolismo
5.
Nature ; 592(7855): 571-576, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33790468

RESUMO

Biological invasions are responsible for substantial biodiversity declines as well as high economic losses to society and monetary expenditures associated with the management of these invasions1,2. The InvaCost database has enabled the generation of a reliable, comprehensive, standardized and easily updatable synthesis of the monetary costs of biological invasions worldwide3. Here we found that the total reported costs of invasions reached a minimum of US$1.288 trillion (2017 US dollars) over the past few decades (1970-2017), with an annual mean cost of US$26.8 billion. Moreover, we estimate that the annual mean cost could reach US$162.7 billion in 2017. These costs remain strongly underestimated and do not show any sign of slowing down, exhibiting a consistent threefold increase per decade. We show that the documented costs are widely distributed and have strong gaps at regional and taxonomic scales, with damage costs being an order of magnitude higher than management expenditures. Research approaches that document the costs of biological invasions need to be further improved. Nonetheless, our findings call for the implementation of consistent management actions and international policy agreements that aim to reduce the burden of invasive alien species.


Assuntos
Biodiversidade , Ecologia/economia , Ciência Ambiental/economia , Internacionalidade , Espécies Introduzidas/economia , Espécies Introduzidas/tendências , Animais , Mapeamento Geográfico , Invertebrados , Modelos Lineares , Plantas , Vertebrados
6.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38436558

RESUMO

Recently, there has been a growing interest in variable selection for causal inference within the context of high-dimensional data. However, when the outcome exhibits a skewed distribution, ensuring the accuracy of variable selection and causal effect estimation might be challenging. Here, we introduce the generalized median adaptive lasso (GMAL) for covariate selection to achieve an accurate estimation of causal effect even when the outcome follows skewed distributions. A distinctive feature of our proposed method is that we utilize a linear median regression model for constructing penalty weights, thereby maintaining the accuracy of variable selection and causal effect estimation even when the outcome presents extremely skewed distributions. Simulation results showed that our proposed method performs comparably to existing methods in variable selection when the outcome follows a symmetric distribution. Besides, the proposed method exhibited obvious superiority over the existing methods when the outcome follows a skewed distribution. Meanwhile, our proposed method consistently outperformed the existing methods in causal estimation, as indicated by smaller root-mean-square error. We also utilized the GMAL method on a deoxyribonucleic acid methylation dataset from the Alzheimer's disease (AD) neuroimaging initiative database to investigate the association between cerebrospinal fluid tau protein levels and the severity of AD.


Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/genética , Simulação por Computador , Bases de Dados Factuais , Modelos Lineares , Processamento de Proteína Pós-Traducional
7.
PLoS Genet ; 19(11): e1011022, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37934796

RESUMO

Epigenetic researchers often evaluate DNA methylation as a potential mediator of the effect of social/environmental exposures on a health outcome. Modern statistical methods for jointly evaluating many mediators have not been widely adopted. We compare seven methods for high-dimensional mediation analysis with continuous outcomes through both diverse simulations and analysis of DNAm data from a large multi-ethnic cohort in the United States, while providing an R package for their seamless implementation and adoption. Among the considered choices, the best-performing methods for detecting active mediators in simulations are the Bayesian sparse linear mixed model (BSLMM) and high-dimensional mediation analysis (HDMA); while the preferred methods for estimating the global mediation effect are high-dimensional linear mediation analysis (HILMA) and principal component mediation analysis (PCMA). We provide guidelines for epigenetic researchers on choosing the best method in practice and offer suggestions for future methodological development.


Assuntos
Metilação de DNA , Análise de Mediação , Humanos , Metilação de DNA/genética , Teorema de Bayes , Modelos Lineares , Exposição Ambiental
8.
J Neurosci ; 44(14)2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38316565

RESUMO

Although we must prioritize the processing of task-relevant information to navigate life, our ability to do so fluctuates across time. Previous work has identified fMRI functional connectivity (FC) networks that predict an individual's ability to sustain attention and vary with attentional state from 1 min to the next. However, traditional dynamic FC approaches typically lack the temporal precision to capture moment-to-moment network fluctuations. Recently, researchers have "unfurled" traditional FC matrices in "edge cofluctuation time series" which measure timepoint-by-timepoint cofluctuations between regions. Here we apply event-based and parametric fMRI analyses to edge time series to capture moment-to-moment fluctuations in networks related to attention. In two independent fMRI datasets examining young adults of both sexes in which participants performed a sustained attention task, we identified a reliable set of edges that rapidly deflects in response to rare task events. Another set of edges varies with continuous fluctuations in attention and overlaps with a previously defined set of edges associated with individual differences in sustained attention. Demonstrating that edge-based analyses are not simply redundant with traditional regions-of-interest-based approaches, up to one-third of reliably deflected edges were not predicted from univariate activity patterns alone. These results reveal the large potential in combining traditional fMRI analyses with edge time series to identify rapid reconfigurations in networks across the brain.


Assuntos
Atenção , Encéfalo , Masculino , Feminino , Adulto Jovem , Humanos , Modelos Lineares , Encéfalo/diagnóstico por imagem , Encéfalo/fisiologia , Atenção/fisiologia , Mapeamento Encefálico/métodos , Imageamento por Ressonância Magnética/métodos
9.
Genet Epidemiol ; 48(4): 164-189, 2024 06.
Artigo em Inglês | MEDLINE | ID: mdl-38420714

RESUMO

Gene-environment (GxE) interactions play a crucial role in understanding the complex etiology of various traits, but assessing them using observational data can be challenging due to unmeasured confounders for lifestyle and environmental risk factors. Mendelian randomization (MR) has emerged as a valuable method for assessing causal relationships based on observational data. This approach utilizes genetic variants as instrumental variables (IVs) with the aim of providing a valid statistical test and estimation of causal effects in the presence of unmeasured confounders. MR has gained substantial popularity in recent years largely due to the success of genome-wide association studies. Many methods have been developed for MR; however, limited work has been done on evaluating GxE interaction. In this paper, we focus on two primary IV approaches: the two-stage predictor substitution and the two-stage residual inclusion, and extend them to accommodate GxE interaction under both the linear and logistic regression models for continuous and binary outcomes, respectively. Comprehensive simulation study and analytical derivations reveal that resolving the linear regression model is relatively straightforward. In contrast, the logistic regression model presents a considerably more intricate challenge, which demands additional effort.


Assuntos
Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Análise da Randomização Mendeliana , Humanos , Modelos Logísticos , Modelos Lineares , Polimorfismo de Nucleotídeo Único , Modelos Genéticos , Variação Genética , Simulação por Computador
10.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36545787

RESUMO

Genotype-by-environment interaction (GEI or GxE) plays an important role in understanding complex human traits. However, it is usually challenging to detect GEI signals efficiently and accurately while adjusting for population stratification and sample relatedness in large-scale genome-wide association studies (GWAS). Here we propose a fast and powerful linear mixed model-based approach, fastGWA-GE, to test for GEI effect and G + GxE joint effect. Our extensive simulations show that fastGWA-GE outperforms other existing GEI test methods by controlling genomic inflation better, providing larger power and running hundreds to thousands of times faster. We performed a fastGWA-GE analysis of ~7.27 million variants on 452 249 individuals of European ancestry for 13 quantitative traits and five environment variables in the UK Biobank GWAS data and identified 96 significant signals (72 variants across 57 loci) with GEI test P-values < 1 × 10-9, including 27 novel GEI associations, which highlights the effectiveness of fastGWA-GE in GEI signal discovery in large-scale GWAS.


Assuntos
Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Humanos , Fenótipo , Genótipo , Modelos Lineares , Polimorfismo de Nucleotídeo Único
11.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36617187

RESUMO

Differential abundance analysis (DAA) is one central statistical task in microbiome data analysis. A robust and powerful DAA tool can help identify highly confident microbial candidates for further biological validation. Current microbiome studies frequently generate correlated samples from different microbiome sampling schemes such as spatial and temporal sampling. In the past decade, a number of DAA tools for correlated microbiome data (DAA-c) have been proposed. Disturbingly, different DAA-c tools could sometimes produce quite discordant results. To recommend the best practice to the field, we performed the first comprehensive evaluation of existing DAA-c tools using real data-based simulations. Overall, the linear model-based methods LinDA, MaAsLin2 and LDM are more robust than methods based on generalized linear models. The LinDA method is the only method that maintains reasonable performance in the presence of strong compositional effects.


Assuntos
Benchmarking , Microbiota , Microbiota/genética , Modelos Lineares , Bases de Dados Factuais , Metagenômica/métodos
12.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37141142

RESUMO

In genome assembly, scaffolding can obtain more complete and continuous scaffolds. Current scaffolding methods usually adopt one type of read to construct a scaffold graph and then orient and order contigs. However, scaffolding with the strengths of two or more types of reads seems to be a better solution to some tricky problems. Combining the advantages of different types of data is significant for scaffolding. Here, a hybrid scaffolding method (SLHSD) is present that simultaneously leverages the precision of short reads and the length advantage of long reads. Building an optimal scaffold graph is an important foundation for getting scaffolds. SLHSD uses a new algorithm that combines long and short read alignment information to determine whether to add an edge and how to calculate the edge weight in a scaffold graph. In addition, SLHSD develops a strategy to ensure that edges with high confidence can be added to the graph with priority. Then, a linear programming model is used to detect and remove remaining false edges in the graph. We compared SLHSD with other scaffolding methods on five datasets. Experimental results show that SLHSD outperforms other methods. The open-source code of SLHSD is available at https://github.com/luojunwei/SLHSD.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Modelos Lineares
13.
PLoS Biol ; 20(2): e3001562, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35180228

RESUMO

The power of language to modify the reader's perception of interpreting biomedical results cannot be underestimated. Misreporting and misinterpretation are pressing problems in randomized controlled trials (RCT) output. This may be partially related to the statistical significance paradigm used in clinical trials centered around a P value below 0.05 cutoff. Strict use of this P value may lead to strategies of clinical researchers to describe their clinical results with P values approaching but not reaching the threshold to be "almost significant." The question is how phrases expressing nonsignificant results have been reported in RCTs over the past 30 years. To this end, we conducted a quantitative analysis of English full texts containing 567,758 RCTs recorded in PubMed between 1990 and 2020 (81.5% of all published RCTs in PubMed). We determined the exact presence of 505 predefined phrases denoting results that approach but do not cross the line of formal statistical significance (P < 0.05). We modeled temporal trends in phrase data with Bayesian linear regression. Evidence for temporal change was obtained through Bayes factor (BF) analysis. In a randomly sampled subset, the associated P values were manually extracted. We identified 61,741 phrases in 49,134 RCTs indicating almost significant results (8.65%; 95% confidence interval (CI): 8.58% to 8.73%). The overall prevalence of these phrases remained stable over time, with the most prevalent phrases being "marginally significant" (in 7,735 RCTs), "all but significant" (7,015), "a nonsignificant trend" (3,442), "failed to reach statistical significance" (2,578), and "a strong trend" (1,700). The strongest evidence for an increased temporal prevalence was found for "a numerical trend," "a positive trend," "an increasing trend," and "nominally significant." In contrast, the phrases "all but significant," "approaches statistical significance," "did not quite reach statistical significance," "difference was apparent," "failed to reach statistical significance," and "not quite significant" decreased over time. In a random sampled subset of 29,000 phrases, the manually identified and corresponding 11,926 P values, 68,1% ranged between 0.05 and 0.15 (CI: 67. to 69.0; median 0.06). Our results show that RCT reports regularly contain specific phrases describing marginally nonsignificant results to report P values close to but above the dominant 0.05 cutoff. The fact that the prevalence of the phrases remained stable over time indicates that this practice of broadly interpreting P values close to a predefined threshold remains prevalent. To enhance responsible and transparent interpretation of RCT results, researchers, clinicians, reviewers, and editors may reduce the focus on formal statistical significance thresholds and stimulate reporting of P values with corresponding effect sizes and CIs and focus on the clinical relevance of the statistical difference found in RCTs.


Assuntos
PubMed/normas , Publicações/normas , Ensaios Clínicos Controlados Aleatórios como Assunto/normas , Projetos de Pesquisa/normas , Relatório de Pesquisa/normas , Teorema de Bayes , Viés , Humanos , Modelos Lineares , Avaliação de Resultados em Cuidados de Saúde/métodos , Avaliação de Resultados em Cuidados de Saúde/normas , Avaliação de Resultados em Cuidados de Saúde/estatística & dados numéricos , PubMed/estatística & dados numéricos , Publicações/estatística & dados numéricos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Reprodutibilidade dos Testes
14.
PLoS Comput Biol ; 20(4): e1011975, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38669271

RESUMO

The brain produces diverse functions, from perceiving sounds to producing arm reaches, through the collective activity of populations of many neurons. Determining if and how the features of these exogenous variables (e.g., sound frequency, reach angle) are reflected in population neural activity is important for understanding how the brain operates. Often, high-dimensional neural population activity is confined to low-dimensional latent spaces. However, many current methods fail to extract latent spaces that are clearly structured by exogenous variables. This has contributed to a debate about whether or not brains should be thought of as dynamical systems or representational systems. Here, we developed a new latent process Bayesian regression framework, the orthogonal stochastic linear mixing model (OSLMM) which introduces an orthogonality constraint amongst time-varying mixture coefficients, and provide Markov chain Monte Carlo inference procedures. We demonstrate superior performance of OSLMM on latent trajectory recovery in synthetic experiments and show superior computational efficiency and prediction performance on several real-world benchmark data sets. We primarily focus on demonstrating the utility of OSLMM in two neural data sets: µECoG recordings from rat auditory cortex during presentation of pure tones and multi-single unit recordings form monkey motor cortex during complex arm reaching. We show that OSLMM achieves superior or comparable predictive accuracy of neural data and decoding of external variables (e.g., reach velocity). Most importantly, in both experimental contexts, we demonstrate that OSLMM latent trajectories directly reflect features of the sounds and reaches, demonstrating that neural dynamics are structured by neural representations. Together, these results demonstrate that OSLMM will be useful for the analysis of diverse, large-scale biological time-series datasets.


Assuntos
Córtex Auditivo , Teorema de Bayes , Cadeias de Markov , Modelos Neurológicos , Neurônios , Processos Estocásticos , Animais , Ratos , Córtex Auditivo/fisiologia , Neurônios/fisiologia , Biologia Computacional , Modelos Lineares , Método de Monte Carlo , Simulação por Computador
15.
Nature ; 568(7751): 221-225, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30944480

RESUMO

The global land and ocean carbon sinks have increased proportionally with increasing carbon dioxide emissions during the past decades1. It is thought that Northern Hemisphere lands make a dominant contribution to the global land carbon sink2-7; however, the long-term trend of the northern land sink remains uncertain. Here, using measurements of the interhemispheric gradient of atmospheric carbon dioxide from 1958 to 2016, we show that the northern land sink remained stable between the 1960s and the late 1980s, then increased by 0.5 ± 0.4 petagrams of carbon per year during the 1990s and by 0.6 ± 0.5 petagrams of carbon per year during the 2000s. The increase of the northern land sink in the 1990s accounts for 65% of the increase in the global land carbon flux during that period. The subsequent increase in the 2000s is larger than the increase in the global land carbon flux, suggesting a coincident decrease of carbon uptake in the Southern Hemisphere. Comparison of our findings with the simulations of an ensemble of terrestrial carbon models5,8 over the same period suggests that the decadal change in the northern land sink between the 1960s and the 1990s can be explained by a combination of increasing concentrations of atmospheric carbon dioxide, climate variability and changes in land cover. However, the increase during the 2000s is underestimated by all models, which suggests the need for improved consideration of changes in drivers such as nitrogen deposition, diffuse light and land-use change. Overall, our findings underscore the importance of Northern Hemispheric land as a carbon sink.


Assuntos
Dióxido de Carbono/análise , Dióxido de Carbono/história , Sequestro de Carbono , Mapeamento Geográfico , Sedimentos Geológicos/química , Atmosfera/química , Carbono/química , Dióxido de Carbono/química , China , Materiais de Construção/análise , Florestas , Combustíveis Fósseis/análise , História do Século XX , História do Século XXI , Modelos Lineares , Modelos Teóricos , Nitrogênio/química , Sibéria , Incerteza
16.
Nucleic Acids Res ; 51(8): 3501-3512, 2023 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-36809800

RESUMO

Human diseases and agricultural traits can be predicted by modeling a genetic random polygenic effect in linear mixed models. To estimate variance components and predict random effects of the model efficiently with limited computational resources has always been of primary concern, especially when it involves increasing the genotype data scale in the current genomic era. Here, we thoroughly reviewed the development history of statistical algorithms used in genetic evaluation and theoretically compared their computational complexity and applicability for different data scenarios. Most importantly, we presented a computationally efficient, functionally enriched, multi-platform and user-friendly software package named 'HIBLUP' to address the challenges that are faced currently using big genomic data. Powered by advanced algorithms, elaborate design and efficient programming, HIBLUP computed fastest while using the lowest memory in analyses, and the greater the number of individuals that are genotyped, the greater the computational benefits from HIBLUP. We also demonstrated that HIBLUP is the only tool which can accomplish the analyses for a UK Biobank-scale dataset within 1 h using the proposed efficient 'HE + PCG' strategy. It is foreseeable that HIBLUP will facilitate genetic research for human, plants and animals. The HIBLUP software and user manual can be accessed freely at https://www.hiblup.com.


Both human diseases and agricultural traits can be predicted by incorporating phenotypic observations and a relationship matrix among individuals in a linear mixed model. Due to the great demand for processing massive data of genotyped individuals, the existing algorithms that require several repetitions of inverse computing on increasingly big dense matrices (e.g. the relationship matrix and the coefficient matrix of mixed model equations) have encountered a bottleneck. Here, we presented a software tool named 'HIBLUP' to address the challenges. Powered by our advanced algorithms (e.g. HE + PCG), elaborate design and efficient programming, HIBLUP can successfully avoid the inverse computing for any big matrix and compute fastest under the lowest memory, which makes it very promising for genetic evaluation using big genomic data.


Assuntos
Genômica , Modelos Genéticos , Animais , Humanos , Algoritmos , Genoma , Genótipo , Modelos Lineares
17.
PLoS Genet ; 18(4): e1010151, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35442943

RESUMO

With the advent of high throughput genetic data, there have been attempts to estimate heritability from genome-wide SNP data on a cohort of distantly related individuals using linear mixed model (LMM). Fitting such an LMM in a large scale cohort study, however, is tremendously challenging due to its high dimensional linear algebraic operations. In this paper, we propose a new method named PredLMM approximating the aforementioned LMM motivated by the concepts of genetic coalescence and Gaussian predictive process. PredLMM has substantially better computational complexity than most of the existing LMM based methods and thus, provides a fast alternative for estimating heritability in large scale cohort studies. Theoretically, we show that under a model of genetic coalescence, the limiting form of our approximation is the celebrated predictive process approximation of large Gaussian process likelihoods that has well-established accuracy standards. We illustrate our approach with extensive simulation studies and use it to estimate the heritability of multiple quantitative traits from the UK Biobank cohort.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Estudos de Coortes , Estudo de Associação Genômica Ampla/métodos , Humanos , Modelos Lineares , Distribuição Normal , Fenótipo , Polimorfismo de Nucleotídeo Único/genética
18.
Proc Natl Acad Sci U S A ; 119(39): e2212959119, 2022 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-36122202

RESUMO

Detecting genetic variants associated with the variance of complex traits, that is, variance quantitative trait loci (vQTLs), can provide crucial insights into the interplay between genes and environments and how they jointly shape human phenotypes in the population. We propose a quantile integral linear model (QUAIL) to estimate genetic effects on trait variability. Through extensive simulations and analyses of real data, we demonstrate that QUAIL provides computationally efficient and statistically powerful vQTL mapping that is robust to non-Gaussian phenotypes and confounding effects on phenotypic variability. Applied to UK Biobank (n = 375,791), QUAIL identified 11 vQTLs for body mass index (BMI) that have not been previously reported. Top vQTL findings showed substantial enrichment for interactions with physical activities and sedentary behavior. Furthermore, variance polygenic scores (vPGSs) based on QUAIL effect estimates showed superior predictive performance on both population-level and within-individual BMI variability compared to existing approaches. Overall, QUAIL is a unified framework to quantify genetic effects on the phenotypic variability at both single-variant and vPGS levels. It addresses critical limitations in existing approaches and may have broad applications in future gene-environment interaction studies.


Assuntos
Variação Biológica da População , Modelos Biológicos , Fenótipo , Variação Biológica da População/genética , Simulação por Computador , Interação Gene-Ambiente , Humanos , Modelos Lineares , Locos de Características Quantitativas
19.
J Infect Dis ; 229(Supplement_1): S25-S33, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-37249267

RESUMO

BACKGROUND: Previous studies reported inconsistent findings regarding the association between respiratory syncytial virus (RSV) subgroup distribution and timing of RSV season. We aimed to further understand the association by conducting a global-level systematic analysis. METHODS: We compiled published data on RSV seasonality through a systematic literature review, and unpublished data shared by international collaborators. Using annual cumulative proportion (ACP) of RSV-positive cases, we defined RSV season onset and offset as ACP reaching 10% and 90%, respectively. Linear regression models accounting for meteorological factors were constructed to analyze the association of proportion of RSV-A with the corresponding RSV season onset and offset. RESULTS: We included 36 study sites from 20 countries, providing data for 179 study-years in 1995-2019. Globally, RSV subgroup distribution was not significantly associated with RSV season onset or offset globally, except for RSV season offset in the tropics in 1 model, possibly by chance. Models that included RSV subgroup distribution and meteorological factors explained only 2%-4% of the variations in timing of RSV season. CONCLUSIONS: Year-on-year variations in RSV season onset and offset are not well explained by RSV subgroup distribution or meteorological factors. Factors including population susceptibility, mobility, and viral interference should be examined in future studies.


Assuntos
Vírus Sincicial Respiratório Humano , Humanos , Modelos Lineares , Estações do Ano , Interferência Viral
20.
BMC Bioinformatics ; 25(1): 43, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38273228

RESUMO

The computation of a similarity measure for genomic data is a standard tool in computational genetics. The principal components of such matrices are routinely used to correct for biases due to confounding by population stratification, for instance in linear regressions. However, the calculation of both a similarity matrix and its singular value decomposition (SVD) are computationally intensive. The contribution of this article is threefold. First, we demonstrate that the calculation of three matrices (called the covariance matrix, the weighted Jaccard matrix, and the genomic relationship matrix) can be reformulated in a unified way which allows for the application of a randomized SVD algorithm, which is faster than the traditional computation. The fast SVD algorithm we present is adapted from an existing randomized SVD algorithm and ensures that all computations are carried out in sparse matrix algebra. The algorithm only assumes that row-wise and column-wise subtraction and multiplication of a vector with a sparse matrix is available, an operation that is efficiently implemented in common sparse matrix packages. An exception is the so-called Jaccard matrix, which does not have a structure applicable for the fast SVD algorithm. Second, an approximate Jaccard matrix is introduced to which the fast SVD computation is applicable. Third, we establish guaranteed theoretical bounds on the accuracy (in [Formula: see text] norm and angle) between the principal components of the Jaccard matrix and the ones of our proposed approximation, thus putting the proposed Jaccard approximation on a solid mathematical foundation, and derive the theoretical runtime of our algorithm. We illustrate that the approximation error is low in practice and empirically verify the theoretical runtime scalings on both simulated data and data of the 1000 Genome Project.


Assuntos
Genoma , Genômica , Algoritmos , Modelos Lineares
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa