Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 81
Filtrar
1.
Mol Ecol ; 33(6): e17299, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38380534

RESUMO

Additive and dominance genetic variances underlying the expression of quantitative traits are important quantities for predicting short-term responses to selection, but they are notoriously challenging to estimate in most non-model wild populations. Specifically, large-sized or panmictic populations may be characterized by low variance in genetic relatedness among individuals which, in turn, can prevent accurate estimation of quantitative genetic parameters. We used estimates of genome-wide identity-by-descent (IBD) sharing from autosomal SNP loci to estimate quantitative genetic parameters for ecologically important traits in nine-spined sticklebacks (Pungitius pungitius) from a large, outbred population. Using empirical and simulated datasets, with varying sample sizes and pedigree complexity, we assessed the performance of different crossing schemes in estimating additive genetic variance and heritability for all traits. We found that low variance in relatedness characteristic of wild outbred populations with high migration rate can impair the estimation of quantitative genetic parameters and bias heritability estimates downwards. On the other hand, the use of a half-sib/full-sib design allowed precise estimation of genetic variance components and revealed significant additive variance and heritability for all measured traits, with negligible dominance contributions. Genome-partitioning and QTL mapping analyses revealed that most traits had a polygenic basis and were controlled by genes at multiple chromosomes. Furthermore, different QTL contributed to variation in the same traits in different populations suggesting heterogeneous underpinnings of parallel evolution at the phenotypic level. Our results provide important guidelines for future studies aimed at estimating adaptive potential in the wild, particularly for those conducted in outbred large-sized populations.


Assuntos
Genoma , Herança Multifatorial , Humanos , Genoma/genética , Mapeamento Cromossômico , Fenótipo , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética
2.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37963057

RESUMO

MOTIVATION: Due to advances in measuring technology, many new phenotype, gene expression, and other omics time-course datasets are now commonly available. Cluster analysis may provide useful information about the structure of such data. RESULTS: In this work, we propose BELMM (Bayesian Estimation of Latent Mixture Models): a flexible framework for analysing, clustering, and modelling time-series data in a Bayesian setting. The framework is built on mixture modelling: first, the mean curves of the mixture components are assumed to follow random walk smoothing priors. Second, we choose the most plausible model and the number of mixture components using the Reversible-jump Markov chain Monte Carlo. Last, we assign the individual time series into clusters based on the similarity to the cluster-specific trend curves determined by the latent random walk processes. We demonstrate the use of fast and slow implementations of our approach on both simulated and real time-series data using widely available software R, Stan, and CU-MSDSp. AVAILABILITY AND IMPLEMENTATION: The French mortality dataset is available at http://www.mortality.org, the Drosophila melanogaster embryogenesis gene expression data at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE121160. Details on our simulated datasets are available in the Supplementary Material, and R scripts and a detailed tutorial on GitHub at https://github.com/ollisa/BELMM. The software CU-MSDSp is available on GitHub at https://github.com/jtchavisIII/CU-MSDSp.


Assuntos
Drosophila melanogaster , Software , Animais , Teorema de Bayes , Drosophila melanogaster/genética , Fatores de Tempo , Análise por Conglomerados
3.
Bioinformatics ; 39(7)2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37348543

RESUMO

MOTIVATION: Genome-wide association studies (GWAS) have been successful in identifying genomic loci associated with complex traits. Genetic fine-mapping aims to detect independent causal variants from the GWAS-identified loci, adjusting for linkage disequilibrium patterns. RESULTS: We present "FiniMOM" (fine-mapping using a product inverse-moment prior), a novel Bayesian fine-mapping method for summarized genetic associations. For causal effects, the method uses a nonlocal inverse-moment prior, which is a natural prior distribution to model non-null effects in finite samples. A beta-binomial prior is set for the number of causal variants, with a parameterization that can be used to control for potential misspecifications in the linkage disequilibrium reference. The results of simulations studies aimed to mimic a typical GWAS on circulating protein levels show improved credible set coverage and power of the proposed method over current state-of-the-art fine-mapping method SuSiE, especially in the case of multiple causal variants within a locus. AVAILABILITY AND IMPLEMENTATION: https://vkarhune.github.io/finimom/.


Assuntos
Estudo de Associação Genômica Ampla , Genômica , Estudo de Associação Genômica Ampla/métodos , Teorema de Bayes , Mapeamento Cromossômico/métodos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único
4.
Sci Rep ; 12(1): 20358, 2022 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-36437268

RESUMO

Attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy coupled with machine learning-based partial least squares discriminant analysis (PLS-DA) was applied to study if severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) could be detected from nasopharyngeal swab samples originally collected for polymerase chain reaction (PCR) analysis. Our retrospective study included 558 positive and 558 negative samples collected from Northern Finland. Overall, we found moderate diagnostic performance for ATR-FTIR when PCR analysis was used as the gold standard: the average area under the receiver operating characteristics curve (AUROC) was 0.67-0.68 (min. 0.65, max. 0.69) with 20, 10 and 5 k-fold cross validations. Mean accuracy, sensitivity and specificity was 0.62-0.63 (min. 0.60, max. 0.65), 0.61 (min. 0.58, max. 0.65) and 0.64 (min. 0.59, max. 0.67) with 20, 10 and 5 k-fold cross validations. As a conclusion, our study with relatively large sample set clearly indicate that measured ATR-FTIR spectrum contains specific information for SARS-CoV-2 infection (P < 0.001 for AUROC in label permutation test). However, the diagnostic performance of ATR-FTIR remained only moderate, potentially due to low concentration of viral particles in the transport medium. Further studies are needed before ATR-FTIR can be recommended for fast screening of SARS-CoV-2 from nasopharyngeal swab samples.


Assuntos
COVID-19 , Humanos , Espectroscopia de Infravermelho com Transformada de Fourier/métodos , COVID-19/diagnóstico , SARS-CoV-2 , Estudos Retrospectivos , Nasofaringe
5.
Front Plant Sci ; 13: 800161, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35574107

RESUMO

Prediction of complex traits based on genome-wide marker information is of central importance for both animal and plant breeding. Numerous models have been proposed for the prediction of complex traits and still considerable effort has been given to improve the prediction accuracy of these models, because various genetics factors like additive, dominance and epistasis effects can influence of the prediction accuracy of such models. Recently machine learning (ML) methods have been widely applied for prediction in both animal and plant breeding programs. In this study, we propose a new algorithm for genomic prediction which is based on neural networks, but incorporates classical elements of LASSO. Our new method is able to account for the local epistasis (higher order interaction between the neighboring markers) in the prediction. We compare the prediction accuracy of our new method with the most commonly used prediction methods, such as BayesA, BayesB, Bayesian Lasso (BL), genomic BLUP and Elastic Net (EN) using the heterogenous stock mouse and rice field data sets.

6.
Proc Biol Sci ; 289(1975): 20220352, 2022 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-35582807

RESUMO

Heritable variation in traits under natural selection is a prerequisite for evolutionary response. While it is recognized that trait heritability may vary spatially and temporally depending on which environmental conditions traits are expressed under, less is known about the possibility that genetic variance contributing to the expected selection response in a given trait may vary at different stages of ontogeny. Specifically, whether different loci underlie the expression of a trait throughout development and thus providing an additional source of variation for selection to act on in the wild, is unclear. Here we show that body size, an important life-history trait, is heritable throughout ontogeny in the nine-spined stickleback (Pungitius pungitius). Nevertheless, both analyses of quantitative trait loci and genetic correlations across ages show that different chromosomes/loci contribute to this heritability in different ontogenic time-points. This suggests that body size can respond to selection at different stages of ontogeny but that this response is determined by different loci at different points of development. Hence, our study provides important results regarding our understanding of the genetics of ontogeny and opens an interesting avenue of research for studying age-specific genetic architecture as a source of non-parallel evolution.


Assuntos
Smegmamorpha , Animais , Tamanho Corporal/genética , Variação Genética , Fenótipo , Locos de Características Quantitativas , Seleção Genética , Smegmamorpha/fisiologia
7.
Artigo em Inglês | MEDLINE | ID: mdl-35324438

RESUMO

Many interventional surgical procedures rely on medical imaging to visualize and track instruments. Such imaging methods not only need to be real time capable but also provide accurate and robust positional information. In ultrasound (US) applications, typically, only 2-D data from a linear array are available, and as such, obtaining accurate positional estimation in three dimensions is nontrivial. In this work, we first train a neural network, using realistic synthetic training data, to estimate the out-of-plane offset of an object with the associated axial aberration in the reconstructed US image. The obtained estimate is then combined with a Kalman filtering approach that utilizes positioning estimates obtained in previous time frames to improve localization robustness and reduce the impact of measurement noise. The accuracy of the proposed method is evaluated using simulations, and its practical applicability is demonstrated on experimental data obtained using a novel optical US imaging setup. Accurate and robust positional information is provided in real time. Axial and lateral coordinates for out-of-plane objects are estimated with a mean error of 0.1 mm for simulated data and a mean error of 0.2 mm for experimental data. The 3-D localization is most accurate for elevational distances larger than 1 mm, with a maximum distance of 6 mm considered for a 25-mm aperture.


Assuntos
Redes Neurais de Computação , Imagem Óptica , Ultrassonografia/métodos
8.
G3 (Bethesda) ; 12(2)2022 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-35100338

RESUMO

We introduce a new model selection criterion for sparse complex gene network modeling where gene co-expression relationships are estimated from data. This is a novel formulation of the gap statistic and it can be used for the optimal choice of a regularization parameter in graphical models. Our criterion favors gene network structure which differs from a trivial gene interaction structure obtained totally at random. We call the criterion the gap-com statistic (gap community statistic). The idea of the gap-com statistic is to examine the difference between the observed and the expected counts of communities (clusters) where the expected counts are evaluated using either data permutations or reference graph (the Erdos-Rényi graph) resampling. The latter represents a trivial gene network structure determined by chance. We put emphasis on complex network inference because the structure of gene networks is usually nontrivial. For example, some of the genes can be clustered together or some genes can be hub genes. We evaluate the performance of the gap-com statistic in graphical model selection and compare its performance to some existing methods using simulated and real biological data examples.


Assuntos
Redes Reguladoras de Genes
9.
Mol Ecol ; 30(19): 4740-4756, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34270821

RESUMO

Dispersal has a crucial role determining ecoevolutionary dynamics through both gene flow and population size regulation. However, to study dispersal and its consequences, one must distinguish immigrants from residents. Dispersers can be identified using telemetry, capture-mark-recapture (CMR) methods, or genetic assignment methods. All of these methods have disadvantages, such as high costs and substantial field efforts needed for telemetry and CMR surveys, and adequate genetic distance required in genetic assignment. In this study, we used genome-wide 200K Single Nucleotide Polymorphism data and two different genetic assignment approaches (GSI_SIM, Bayesian framework; BONE, network-based estimation) to identify the dispersers in a house sparrow (Passer domesticus) metapopulation sampled over 16 years. Our results showed higher assignment accuracy with BONE. Hence, we proceeded to diagnose potential sources of errors in the assignment results from the BONE method due to variation in levels of interpopulation genetic differentiation, intrapopulation genetic variation and sample size. We show that assignment accuracy is high even at low levels of genetic differentiation and that it increases with the proportion of a population that has been sampled. Finally, we highlight that dispersal studies integrating both ecological and genetic data provide robust assessments of the dispersal patterns in natural populations.


Assuntos
Pardais , Animais , Teorema de Bayes , Deriva Genética , Linhagem , Densidade Demográfica , Pardais/genética
10.
PLoS Comput Biol ; 17(5): e1008960, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33939702

RESUMO

A wide variety of 1) parametric regression models and 2) co-expression networks have been developed for finding gene-by-gene interactions underlying complex traits from expression data. While both methodological schemes have their own well-known benefits, little is known about their synergistic potential. Our study introduces their methodological fusion that cross-exploits the strengths of individual approaches via a built-in information-sharing mechanism. This fusion is theoretically based on certain trait-conditioned dependency patterns between two genes depending on their role in the underlying parametric model. Resulting trait-specific co-expression network estimation method 1) serves to enhance the interpretation of biological networks in a parametric sense, and 2) exploits the underlying parametric model itself in the estimation process. To also account for the substantial amount of intrinsic noise and collinearities, often entailed by expression data, a tailored co-expression measure is introduced along with this framework to alleviate related computational problems. A remarkable advance over the reference methods in simulated scenarios substantiate the method's high-efficiency. As proof-of-concept, this synergistic approach is successfully applied in survival analysis, with acute myeloid leukemia data, further highlighting the framework's versatility and broad practical relevance.


Assuntos
Regulação da Expressão Gênica , Algoritmos , Humanos , Leucemia Mieloide Aguda/genética , Estudo de Prova de Conceito , Biologia de Sistemas
11.
Sensors (Basel) ; 21(7)2021 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-33805187

RESUMO

Spatio-temporal interpolation provides estimates of observations in unobserved locations and time slots. In smart cities, interpolation helps to provide a fine-grained contextual and situational understanding of the urban environment, in terms of both short-term (e.g., weather, air quality, traffic) or long term (e.g., crime, demographics) spatio-temporal phenomena. Various initiatives improve spatio-temporal interpolation results by including additional data sources such as vehicle-fitted sensors, mobile phones, or micro weather stations of, for example, smart homes. However, the underlying computing paradigm in such initiatives is predominantly centralized, with all data collected and analyzed in the cloud. This solution is not scalable, as when the spatial and temporal density of sensor data grows, the required transmission bandwidth and computational capacity become unfeasible. To address the scaling problem, we propose EDISON: algorithms for distributed learning and inference, and an edge-native architecture for distributing spatio-temporal interpolation models, their computations, and the observed data vertically and horizontally between device, edge and cloud layers. We demonstrate EDISON functionality in a controlled, simulated spatio-temporal setup with 1 M artificial data points. While the main motivation of EDISON is the distribution of the heavy computations, the results show that EDISON also provides an improvement over alternative approaches, reaching at best a 10% smaller RMSE than a global interpolation and 6% smaller RMSE than a baseline distributed approach.

12.
G3 (Bethesda) ; 11(6)2021 06 17.
Artigo em Inglês | MEDLINE | ID: mdl-33822941

RESUMO

Advanced backcross (AB) populations have been widely used to identify and utilize beneficial alleles in various crops such as rice, tomato, wheat, and barley. For the development of an AB population, a controlled crossing scheme is used and this controlled crossing along with the selection (both natural and artificial) of agronomically adapted alleles during the development of AB population may lead to unbalanced allele frequencies in the population. However, it is commonly believed that interval mapping of traits in experimental crosses such as AB populations is immune to the deviations from the expected frequencies under Mendelian segregation. Using two AB populations and simulated data sets as examples, we describe the severity of the problem caused by unbalanced allele frequencies in quantitative trait loci mapping and demonstrate how it can be corrected using the linear mixed model having a polygenic effect with the covariance structure (genomic relationship matrix) calculated from molecular markers.


Assuntos
Locos de Características Quantitativas , Triticum , Cruzamentos Genéticos , Mapeamento Cromossômico , Triticum/genética , Fenótipo , Genômica
13.
Bioinformatics ; 37(5): 726-727, 2021 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-32805018

RESUMO

MOTIVATION: Graphical lasso (Glasso) is a widely used tool for identifying gene regulatory networks in systems biology. However, its computational efficiency depends on the choice of regularization parameter (tuning parameter), and selecting this parameter can be highly time consuming. Although fully Bayesian implementations of Glasso alleviate this problem somewhat by specifying a priori distribution for the parameter, these approaches lack the scalability of their frequentist counterparts. RESULTS: Here, we present a new Monte Carlo Penalty Selection method (MCPeSe), a computationally efficient approach to regularization parameter selection for Glasso. MCPeSe combines the scalability and low computational cost of the frequentist Glasso with the ability to automatically choose the regularization by Bayesian Glasso modeling. MCPeSe provides a state-of-the-art 'tuning-free' model selection criterion for Glasso and allows exploration of the posterior probability distribution of the tuning parameter. AVAILABILITY AND IMPLEMENTATION: R source code of MCPeSe, a step by step example showing how to apply MCPeSe and a collection of scripts used to prepare the material in this article are publicly available at GitHub under GPL (https://github.com/markkukuismin/MCPeSe/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Biologia de Sistemas , Teorema de Bayes , Método de Monte Carlo , Probabilidade
14.
Comput Biol Med ; 124: 103935, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32771674

RESUMO

Stress has become a major health concern and there is a need to study and develop new digital means for real-time stress detection. Currently, the majority of stress detection research is using population based approaches that lack the capability to adapt to individual differences. They also use supervised learning methods, requiring extensive labeling of training data, and they are typically tested on data collected in a laboratory and thus do not generalize to field conditions. To address these issues, we present multiple personalized models based on an unsupervised algorithm, the Self-Organizing Map (SOM), and we propose an algorithmic pipeline to apply the method for both laboratory and field data. The performance is evaluated on a dataset of physiological measurements from a laboratory test and on a field dataset consisting of four weeks of physiological and smartphone usage data. In these tests, the performance on the field data was steady across the different personalization levels (accuracy around 60%) and a fully personalized model performed the best on the laboratory data, achieving accuracy of 92% which is comparable to state-of-the-art supervised classifiers. These results demonstrate the feasibility of SOM in personalized mental stress detection both in constrained and free-living environment.


Assuntos
Algoritmos , Laboratórios , Estresse Psicológico , Humanos , Smartphone , Estresse Psicológico/diagnóstico
15.
Genetics ; 215(3): 597-607, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32414870

RESUMO

Whereas nonlinear relationships between genes are acknowledged, there exist only a few methods for estimating nonlinear gene coexpression networks or gene regulatory networks (GCNs/GRNs) with common deficiencies. These methods often consider only pairwise associations between genes, and are, therefore, poorly capable of identifying higher-order regulatory patterns when multiple genes should be considered simultaneously. Another critical issue in current nonlinear GCN/GRN estimation approaches is that they consider linear and nonlinear dependencies at the same time in confounded form nonparametrically. This severely undermines the possibilities for nonlinear associations to be found, since the power of detecting nonlinear dependencies is lower compared to linear dependencies, and the sparsity-inducing procedures might favor linear relationships over nonlinear ones only due to small sample sizes. In this paper, we propose a method to estimate undirected nonlinear GCNs independently from the linear associations between genes based on a novel semiparametric neighborhood selection procedure capable of identifying complex nonlinear associations between genes. Simulation studies using the common DREAM3 and DREAM9 datasets show that the proposed method compares superiorly to the current nonlinear GCN/GRN estimation methods.


Assuntos
Redes Reguladoras de Genes , Genômica/métodos , Transcriptoma , Algoritmos , Animais , Perfilação da Expressão Gênica/métodos , Humanos , Modelos Genéticos
16.
BMC Health Serv Res ; 20(1): 337, 2020 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-32316970

RESUMO

BACKGROUND: In the past two decades, the number of maternity hospitals in Finland has been reduced from 42 to 22. Notwithstanding the benefits of centralization for larger units in terms of increased safety, the closures will inevitably impair geographical accessibility of services. METHODS: This study aimed to employ a set of location-allocation methods to assess the potential impact on accessibility, should the number of maternity hospitals be reduced from 22 to 16. Accurate population grid data combined with road network and hospital facilities data is analyzed with three different location-allocation methods: straight, sequential and capacitated p-median. RESULTS: Depending on the method used to assess the impact of further reduction in the number of maternity hospitals, 0.6 to 2.7% of mothers would have more than a two-hour travel time to the nearest maternity hospital, while the corresponding figure is 0.5 in the current situation. The analyses highlight the areas where the number of births is low, but a maternity hospital is still important in terms of accessibility, and the areas where even one unit would be enough to take care of a considerable volume of births. CONCLUSIONS: Even if the reduction in the number of hospitals might not drastically harm accessibility at the level of the entire population, considerable changes in accessibility can occur for clients living close to a maternity hospital facing closure. As different location-allocation analyses can result in different configurations of hospitals, decision-makers should be aware of their differences to ensure adequate accessibility for clients, especially in remote, sparsely populated areas.


Assuntos
Serviços Centralizados no Hospital , Acessibilidade aos Serviços de Saúde , Maternidades , Criança , Pré-Escolar , Feminino , Finlândia , Reforma dos Serviços de Saúde , Fechamento de Instituições de Saúde , Humanos , Lactente , Sistemas de Informação , Gravidez , Viagem
17.
Bioinformatics ; 36(12): 3795-3802, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32186692

RESUMO

MOTIVATION: Improved DNA technology has made it practical to estimate single-nucleotide polymorphism (SNP)-heritability among distantly related individuals with unknown relationships. For growth- and development-related traits, it is meaningful to base SNP-heritability estimation on longitudinal data due to the time-dependency of the process. However, only few statistical methods have been developed so far for estimating dynamic SNP-heritability and quantifying its full uncertainty. RESULTS: We introduce a completely tuning-free Bayesian Gaussian process (GP)-based approach for estimating dynamic variance components and heritability as their function. For parameter estimation, we use a modern Markov Chain Monte Carlo method which allows full uncertainty quantification. Several datasets are analysed and our results clearly illustrate that the 95% credible intervals of the proposed joint estimation method (which 'borrows strength' from adjacent time points) are significantly narrower than of a two-stage baseline method that first estimates the variance components at each time point independently and then performs smoothing. We compare the method with a random regression model using MTG2 and BLUPF90 software and quantitative measures indicate superior performance of our method. Results are presented for simulated and real data with up to 1000 time points. Finally, we demonstrate scalability of the proposed method for simulated data with tens of thousands of individuals. AVAILABILITY AND IMPLEMENTATION: The C++ implementation dynBGP and simulated data are available in GitHub: https://github.com/aarjas/dynBGP. The programmes can be run in R. Real datasets are available in QTL archive: https://phenome.jax.org/centers/QTLA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Polimorfismo de Nucleotídeo Único , Software , Teorema de Bayes , Humanos , Método de Monte Carlo , Distribuição Normal
18.
Int J Biostat ; 2020 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-32061165

RESUMO

We introduce a Bayesian framework for simultaneous feature selection and outlier detection in sparse high-dimensional regression models, with a focus on quantitative trait locus (QTL) mapping in experimental crosses. More specifically, we incorporate the robust mean shift outlier handling mechanism into the multiple QTL mapping regression model and apply LASSO regularization concurrently to the genetic effects and the mean-shift terms through the flexible extended Bayesian LASSO (EBL) prior structure, thereby combining QTL mapping and outlier detection into a single sparse model representation problem. The EBL priors on the mean-shift terms prevent outlying phenotypic values from distorting the genotype-phenotype association and allow their detection as cases with outstanding mean shift values following the LASSO shrinkage. Simulation results demonstrate the effectiveness of our new methodology at mapping QTLs in the presence of outlying phenotypic values and simultaneously identifying the potential outliers, while maintaining a comparable performance to the standard EBL on outlier-free data.

19.
Genetics ; 213(4): 1209-1224, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31585953

RESUMO

Gaussian process (GP)-based automatic relevance determination (ARD) is known to be an efficient technique for identifying determinants of gene-by-gene interactions important to trait variation. However, the estimation of GP models is feasible only for low-dimensional datasets (∼200 variables), which severely limits application of the GP-based ARD method for high-throughput sequencing data. In this paper, we provide a nonparametric prescreening method that preserves virtually all the major benefits of the GP-based ARD method and extends its scalability to the typical high-dimensional datasets used in practice. In several simulated test scenarios, the proposed method compared favorably with existing nonparametric dimension reduction/prescreening methods suitable for higher-order interaction searches. As a real-data example, the proposed method was applied to a high-throughput dataset downloaded from the cancer genome atlas (TCGA) with measured expression levels of 16,976 genes (after preprocessing) from patients diagnosed with acute myeloid leukemia.


Assuntos
Epistasia Genética , Modelos Genéticos , Característica Quantitativa Herdável , Simulação por Computador , Curva ROC
20.
Plant J ; 100(1): 83-100, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31166032

RESUMO

Norway spruce is a boreal forest tree species of significant ecological and economic importance. Hence there is a strong imperative to dissect the genetics underlying important wood quality traits in the species. We performed a functional genome-wide association study (GWAS) of 17 wood traits in Norway spruce using 178 101 single nucleotide polymorphisms (SNPs) generated from exome genotyping of 517 mother trees. The wood traits were defined using functional modelling of wood properties across annual growth rings. We applied a Least Absolute Shrinkage and Selection Operator (LASSO-based) association mapping method using a functional multilocus mapping approach that utilizes latent traits, with a stability selection probability method as the hypothesis testing approach to determine a significant quantitative trait locus. The analysis provided 52 significant SNPs from 39 candidate genes, including genes previously implicated in wood formation and tree growth in spruce and other species. Our study represents a multilocus GWAS for complex wood traits in Norway spruce. The results advance our understanding of the genetics influencing wood traits and identifies candidate genes for future functional studies.


Assuntos
Genes de Plantas/genética , Estudo de Associação Genômica Ampla/métodos , Picea/genética , Locos de Características Quantitativas/genética , Madeira/genética , Algoritmos , Genômica/métodos , Genótipo , Desequilíbrio de Ligação , Noruega , Fenótipo , Picea/classificação , Polimorfismo de Nucleotídeo Único , Madeira/classificação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA