Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
BMC Bioinformatics ; 21(1): 177, 2020 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-32366216

RESUMO

BACKGROUND: Feature screening plays a critical role in handling ultrahigh dimensional data analyses when the number of features exponentially exceeds the number of observations. It is increasingly common in biomedical research to have case-control (binary) response and an extremely large-scale categorical features. However, the approach considering such data types is limited in extant literature. In this article, we propose a new feature screening approach based on the iterative trend correlation (ITC-SIS, for short) to detect important susceptibility loci that are associated with the polycystic ovary syndrome (PCOS) affection status by screening 731,442 SNP features that were collected from the genome-wide association studies. RESULTS: We prove that the trend correlation based screening approach satisfies the theoretical strong screening consistency property under a set of reasonable conditions, which provides an appealing theoretical support for its outperformance. We demonstrate that the finite sample performance of ITC-SIS is accurate and fast through various simulation designs. CONCLUSION: ITC-SIS serves as a good alternative method to detect disease susceptibility loci for clinic genomic data.


Assuntos
Predisposição Genética para Doença , Síndrome do Ovário Policístico/diagnóstico , Síndrome do Ovário Policístico/genética , Estudos de Casos e Controles , Feminino , Genoma , Estudo de Associação Genômica Ampla/métodos , Humanos
2.
Brief Bioinform ; 19(3): 461-471, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-28062411

RESUMO

Detecting how genes regulate biological shape has become a multidisciplinary research interest because of its wide application in many disciplines. Despite its fundamental importance, the challenges of accurately extracting information from an image, statistically modeling the high-dimensional shape and meticulously locating shape quantitative trait loci (QTL) affect the progress of this research. In this article, we propose a novel integrated framework that incorporates shape analysis, statistical curve modeling and genetic mapping to detect significant QTLs regulating variation of biological shape traits. After quantifying morphological shape via a radius centroid contour approach, each shape, as a phenotype, was characterized as a high-dimensional curve, varying as angle θ runs clockwise with the first point starting from angle zero. We then modeled the dynamic trajectories of three mean curves and variation patterns as functions of θ. Our framework led to the detection of a few significant QTLs regulating the variation of leaf shape collected from a natural population of poplar, Populus szechuanica var tibetica. This population, distributed at altitudes 2000-4500 m above sea level, is an evolutionarily important plant species. This is the first work in the quantitative genetic shape mapping area that emphasizes a sense of 'function' instead of decomposing the shape into a few discrete principal components, as the majority of shape studies do.


Assuntos
Mapeamento Cromossômico/métodos , Folhas de Planta/anatomia & histologia , Populus/anatomia & histologia , Populus/genética , Locos de Características Quantitativas , Cromossomos de Plantas , Simulação por Computador , Genes de Plantas , Modelos Estatísticos , Fenótipo , Folhas de Planta/genética
3.
BMC Bioinformatics ; 18(1): 212, 2017 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-28403836

RESUMO

BACKGROUND: Although the dimension of the entire genome can be extremely large, only a parsimonious set of influential SNPs are correlated with a particular complex trait and are important to the prediction of the trait. Efficiently and accurately selecting these influential SNPs from millions of candidates is in high demand, but poses challenges. We propose a backward elimination iterative distance correlation (BE-IDC) procedure to select the smallest subset of SNPs that guarantees sufficient prediction accuracy, while also solving the unclear threshold issue for traditional feature screening approaches. RESULTS: Verified through six simulations, the adaptive threshold estimated by the BE-IDC performed uniformly better than fixed threshold methods that have been used in the current literature. We also applied BE-IDC to an Arabidopsis thaliana genome-wide data. Out of 216,130 SNPs, BE-IDC selected four influential SNPs, and confirmed the same FRIGIDA gene that was reported by two other traditional methods. CONCLUSIONS: BE-IDC accommodates both the prediction accuracy and the computational speed that are highly demanded in the genomic selection.


Assuntos
Arabidopsis/genética , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Proteínas de Arabidopsis/genética , Simulação por Computador , Genoma de Planta , Estudo de Associação Genômica Ampla , Genômica , Fenótipo , Melhoramento Vegetal
4.
New Phytol ; 213(1): 455-469, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27650962

RESUMO

Leaf shape traits have long been a focus of many disciplines, but the complex genetic and environmental interactive mechanisms regulating leaf shape variation have not yet been investigated in detail. The question of the respective roles of genes and environment and how they interact to modulate leaf shape is a thorny evolutionary problem, and sophisticated methodology is needed to address it. In this study, we investigated a framework-level approach that inputs shape image photographs and genetic and environmental data, and then outputs the relative importance ranks of all variables after integrating shape feature extraction, dimension reduction, and tree-based statistical models. The power of the proposed framework was confirmed by simulation and a Populus szechuanica var. tibetica data set. This new methodology resulted in the detection of novel shape characteristics, and also confirmed some previous findings. The quantitative modeling of a combination of polygenetic, plastic, epistatic, and gene-environment interactive effects, as investigated in this study, will improve the discernment of quantitative leaf shape characteristics, and the methods are ready to be applied to other leaf morphology data sets. Unlike the majority of approaches in the quantitative leaf shape literature, this framework-level approach is data-driven, without assuming any pre-known shape attributes, landmarks, or model structures.


Assuntos
Interação Gene-Ambiente , Genes de Plantas , Modelos Genéticos , Folhas de Planta/anatomia & histologia , Folhas de Planta/genética , Árvores/anatomia & histologia , Árvores/genética , Algoritmos , Simulação por Computador , Pleiotropia Genética , Processamento de Imagem Assistida por Computador , Desequilíbrio de Ligação/genética , Populus/anatomia & histologia , Populus/genética , Análise de Componente Principal , Comunicações Via Satélite
5.
Brief Bioinform ; 15(4): 571-81, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23460593

RESUMO

Knowledge about biological shape has important implications in biology and biomedicine, but the underlying genetic mechanisms for shape variation have not been well studied. Statistical models play a pivotal role in mapping specific quantitative trait loci (QTLs) that contribute to biological shape and its developmental trajectories. We describe and assess a statistical framework for shape gene identification that incorporates shape and image analysis into a mixture-model framework for QTL mapping. Statistical parameters that define genotype-specific differences in biological shape are estimated by implementing statistical and computational algorithms. A state-of-the-art procedure is described to examine the control patterns of specific QTLs on the origin, properties and functions of biological shape. The statistical framework described will help to address many integrative biological and genetic questions and challenges in shape variation faced by the life sciences community.


Assuntos
Modelos Estatísticos , Algoritmos , Locos de Características Quantitativas
6.
Brief Bioinform ; 15(4): 660-9, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23428353

RESUMO

The recent availability of high-throughput genetic and genomic data allows the genetic architecture of complex traits to be systematically mapped. The application of these genetic results to design and breed new crop types can be made possible through systems mapping. Systems mapping is a computational model that dissects a complex phenotype into its underlying components, coordinates different components in terms of biological laws through mathematical equations and maps specific genes that mediate each component and its connection with other components. Here, we present a new direction of systems mapping by integrating this tool with carbon economy. With an optimal spatial distribution of carbon fluxes between sources and sinks, plants tend to maximize whole-plant growth and competitive ability under limited availability of resources. We argue that such an economical strategy for plant growth and development, once integrated with systems mapping, will not only provide mechanistic insights into plant biology, but also help to spark a renaissance of interest in ideotype breeding in crops and trees.


Assuntos
Biomassa , Mapeamento Cromossômico , Biologia de Sistemas , Locos de Características Quantitativas
7.
BMC Genet ; 16: 148, 2015 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-26698561

RESUMO

BACKGROUND: Genome-wide association studies (GWAS) interrogate large-scale whole genome to characterize the complex genetic architecture for biomedical traits. When the number of SNPs dramatically increases to half million but the sample size is still limited to thousands, the traditional p-value based statistical approaches suffer from unprecedented limitations. Feature screening has proved to be an effective and powerful approach to handle ultrahigh dimensional data statistically, yet it has not received much attention in GWAS. Feature screening reduces the feature space from millions to hundreds by removing non-informative noise. However, the univariate measures used to rank features are mainly based on individual effect without considering the mutual interactions with other features. In this article, we explore the performance of a random forest (RF) based feature screening procedure to emphasize the SNPs that have complex effects for a continuous phenotype. RESULTS: Both simulation and real data analysis are conducted to examine the power of the forest-based feature screening. We compare it with five other popular feature screening approaches via simulation and conclude that RF can serve as a decent feature screening tool to accommodate complex genetic effects such as nonlinear, interactive, correlative, and joint effects. Unlike the traditional p-value based Manhattan plot, we use the Permutation Variable Importance Measure (PVIM) to display the relative significance and believe that it will provide as much useful information as the traditional plot. CONCLUSION: Most complex traits are found to be regulated by epistatic and polygenic variants. The forest-based feature screening is proven to be an efficient, easily implemented, and accurate approach to cope whole genome data with complex structures. Our explorations should add to a growing body of enlargement of feature screening better serving the demands of contemporary genome data.


Assuntos
HDL-Colesterol/genética , Simulação por Computador , Modelos Genéticos , Animais , HDL-Colesterol/sangue , Epistasia Genética , Estudo de Associação Genômica Ampla , Humanos , Hipercolesterolemia , Camundongos , Herança Multifatorial
8.
BMC Genet ; 15 Suppl 1: S5, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25079623

RESUMO

BACKGROUND: Linkage Disequilibrium (LD) is a powerful approach for the identification and characterization of morphological shape, which usually involves multiple genetic markers. However, multiple testing corrections substantially reduce the power of the associated tests. In addition, the principle component analysis (PCA), used to quantify the shape variations into several principal phenotypes, further increases the number of tests. As a result, a powerful multiple testing correction for simultaneous large-scale gene-shape association tests is an essential part of determining statistical significance. Bonferroni adjustments and permutation tests are the most popular approaches to correcting for multiple tests within LD based Quantitative Trait Loci (QTL) models. However, permutations are extremely computationally expensive and may mislead in the presence of family structure. The Bonferroni correction, though simple and fast, is conservative and has low power for large-scale testing. RESULTS: We propose a new multiple testing approach, constructed by combining an Intersection Union Test (IUT) with the Holm correction, which strongly controls the family-wise error rate (FWER) without any additional assumptions on the joint distribution of the test statistics or dependence structure of the markers. The power improvement for the Holm correction, as compared to the standard Bonferroni correction, is examined through a simulation study. A consistent and moderate increase in power is found under the majority of simulated circumstances, including various sample sizes, Heritabilities, and numbers of markers. The power gains are further demonstrated on real leaf shape data from a natural population of poplar, Populus szechuanica var tietica, where more significant QTL associated with morphological shape are detected than under the previously applied Bonferroni adjustment. CONCLUSION: The Holm correction is a valid and powerful method for assessing gene-shape association involving multiple markers, which not only controls the FWER in the strong sense but also improves statistical power.


Assuntos
Mapeamento Cromossômico , Desequilíbrio de Ligação , Modelos Genéticos , Populus/genética , Locos de Características Quantitativas , Estudos de Associação Genética
9.
Curr Genomics ; 15(5): 380-9, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25435800

RESUMO

Controlling for the multiplicity effect is an essential part of determining statistical significance in large-scale single-locus association genome scans on Single Nucleotide Polymorphisms (SNPs). Bonferroni adjustment is a commonly used approach due to its simplicity, but is conservative and has low power for large-scale tests. The permutation test, which is a powerful and popular tool, is computationally expensive and may mislead in the presence of family structure. We propose a computationally efficient and powerful multiple testing correction approach for Linkage Disequilibrium (LD) based Quantitative Trait Loci (QTL) mapping on the basis of graphical weighted-Bonferroni methods. The proposed multiplicity adjustment method synthesizes weighted Bonferroni-based closed testing procedures into a powerful and versatile graphical approach. By tailoring different priorities for the two hypothesis tests involved in LD based QTL mapping, we are able to increase power and maintain computational efficiency and conceptual simplicity. The proposed approach enables strong control of the familywise error rate (FWER). The performance of the proposed approach as compared to the standard Bonferroni correction is illustrated by simulation and real data. We observe a consistent and moderate increase in power under all simulated circumstances, among different sample sizes, heritabilities, and number of SNPs. We also applied the proposed method to a real outbred mouse HDL cholesterol QTL mapping project where we detected the significant QTLs that were highlighted in the literature, while still ensuring strong control of the FWER.

10.
Stat Med ; 32(3): 509-23, 2013 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-22903809

RESUMO

Many phenomena of fundamental importance to biology and biomedicine arise as a dynamic curve, such as organ growth and HIV dynamics. The genetic mapping of these traits is challenged by longitudinal variables measured at irregular and possibly subject-specific time points, in which case nonnegative definiteness of the estimated covariance matrix needs to be guaranteed. We present a semiparametric approach for genetic mapping within the mixture-model setting by jointly modeling mean and covariance structures for irregular longitudinal data. Penalized spline is used to model the mean functions of individual quantitative trait locus (QTL) genotypes as latent variables, whereas an extended generalized linear model is used to approximate the covariance matrix. The parameters for modeling the mean-covariances are estimated by MCMC, using the Gibbs sampler and the Metropolis-Hastings algorithm. We derive the full conditional distributions for the mean and covariance parameters and compute Bayes factors to test the hypothesis about the existence of significant QTLs. We used the model to screen the existence of specific QTLs for age-specific change of body mass index with a sparse longitudinal data set. The new model provides powerful means for broadening the application of genetic mapping to reveal the genetic control of dynamic traits.


Assuntos
Teorema de Bayes , Doenças Cardiovasculares/genética , Mapeamento Cromossômico/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Índice de Massa Corporal , Mapeamento Cromossômico/estatística & dados numéricos , Simulação por Computador , Intervalos de Confiança , Feminino , Técnicas de Genotipagem/estatística & dados numéricos , Humanos , Estudos Longitudinais/estatística & dados numéricos , Masculino , Pessoa de Meia-Idade , Modelos Genéticos , Modelos Estatísticos , Locos de Características Quantitativas/genética
11.
Bioinformatics ; 27(4): 516-23, 2011 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-21156729

RESUMO

MOTIVATION: Despite their success in identifying genes that affect complex disease or traits, current genome-wide association studies (GWASs) based on a single SNP analysis are too simple to elucidate a comprehensive picture of the genetic architecture of phenotypes. A simultaneous analysis of a large number of SNPs, although statistically challenging, especially with a small number of samples, is crucial for genetic modeling. METHOD: We propose a two-stage procedure for multi-SNP modeling and analysis in GWASs, by first producing a 'preconditioned' response variable using a supervised principle component analysis and then formulating Bayesian lasso to select a subset of significant SNPs. The Bayesian lasso is implemented with a hierarchical model, in which scale mixtures of normal are used as prior distributions for the genetic effects and exponential priors are considered for their variances, and then solved by using the Markov chain Monte Carlo (MCMC) algorithm. Our approach obviates the choice of the lasso parameter by imposing a diffuse hyperprior on it and estimating it along with other parameters and is particularly powerful for selecting the most relevant SNPs for GWASs, where the number of predictors exceeds the number of observations. RESULTS: The new approach was examined through a simulation study. By using the approach to analyze a real dataset from the Framingham Heart Study, we detected several significant genes that are associated with body mass index (BMI). Our findings support the previous results about BMI-related SNPs and, meanwhile, gain new insights into the genetic control of this trait. AVAILABILITY: The computer code for the approach developed is available at Penn State Center for Statistical Genetics web site, http://statgen.psu.edu.


Assuntos
Algoritmos , Teorema de Bayes , Estudo de Associação Genômica Ampla , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único , Índice de Massa Corporal , Simulação por Computador , Feminino , Humanos , Masculino , Cadeias de Markov , Análise de Componente Principal
12.
BMC Genet ; 13: 20, 2012 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-22443496

RESUMO

BACKGROUND: Genetic mapping has been used as a tool to study the genetic architecture of complex traits by localizing their underlying quantitative trait loci (QTLs). Statistical methods for genetic mapping rely on a key assumption, that is, traits obey a parametric distribution. However, in practice real data may not perfectly follow the specified distribution. RESULTS: Here, we derive a robust statistical approach for QTL mapping that accommodates a certain degree of misspecification of the true model by incorporating integrated square errors into the genetic mapping framework. A hypothesis testing is formulated by defining a new test statistics--energy difference. CONCLUSIONS: Simulation studies were performed to investigate the statistical properties of this approach and compare these properties with those from traditional maximum likelihood and non-parametric QTL mapping approaches. Lastly, analyses of real examples were conducted to demonstrate the usefulness and utilization of the new approach in a practical genetic setting.


Assuntos
Mapeamento Cromossômico/métodos , Modelos Genéticos , Locos de Características Quantitativas , Estatística como Assunto
13.
Hum Hered ; 72(2): 110-20, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21996601

RESUMO

OBJECTIVE: Longitudinal measurements with bivariate response have been analyzed by several authors using two separate models for each response. However, for most of the biological or medical experiments, the two responses are highly correlated and hence a separate model for each response might not be a desirable way to analyze such data. A single model considering a bivariate response provides a more powerful inference as the correlation between the responses is modeled appropriately. In this article, we propose a dynamic statistical model to detect the genes controlling human blood pressure (systolic and diastolic). METHODS: By modeling the mean function with orthogonal Legendre polynomials and the covariance matrix with a stationary parametric structure, we incorporate the statistical ideas in functional genome-wide association studies to detect SNPs which have significant control on human blood pressure. The traditional false discovery rate is used for multiple comparisons. RESULTS: We analyze the data from the Framingham Heart Study to detect such SNPs by appropriately considering gender-gene interaction. We detect 8 SNPs for males and 7 for females which are most significant in controlling blood pressure. The genotype-specific mean curves and additive and dominant effects over time are shown for each significant SNP for both genders. Simulation studies are performed to examine the statistical properties of our model. The current model will be extremely useful in detecting genes controlling different traits and diseases for humans or non-human subjects.


Assuntos
Pressão Sanguínea/genética , Doenças Cardiovasculares/genética , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Adulto , Cromossomos Humanos/genética , Simulação por Computador , Feminino , Frequência do Gene , Estudos de Associação Genética , Genoma Humano , Humanos , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Modelos Genéticos
14.
Front Psychol ; 13: 898107, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35645929

RESUMO

Family health education is a must for every family, so that children can be taught how to protect their own health. However, in this era of artificial intelligence, many technical operations based on artificial intelligence are born, so the purpose of this study is to apply artificial intelligence technology to family health education. This paper proposes a fusion of artificial intelligence and IoT technologies. Based on the characteristics of artificial intelligence technology, it combines ZigBee technology and RFID technology in the Internet of Things technology to design an artificial intelligence-based service system. Then it designs the theme of family health education by conducting a questionnaire on students' family education and analyzing the results of the questionnaire. And it designs database and performance analysis experiments to improve the artificial intelligence-based family health education public service system designed in this paper. Finally, a comparative experiment between the family health education public service system based on artificial intelligence and the traditional health education method will be carried out. The experimental results show that the family health education public service system based on artificial intelligence has improved by 21.74% compared with the traditional family health education method; compared with the traditional family health education method, the health education effect of the family health education public service system based on artificial intelligence has increased by 13.89%.

15.
Hum Genet ; 129(6): 629-39, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21293879

RESUMO

Although genome-wide association studies (GWAS) are widely used to identify the genetic and environmental etiology of a trait, several key issues related to their statistical power and biological relevance have remained unexplored. Here, we describe a novel statistical approach, called functional GWAS or fGWAS, to analyze the genetic control of traits by integrating biological principles of trait formation into the GWAS framework through mathematical and statistical bridges. fGWAS can address many fundamental questions, such as the patterns of genetic control over development, the duration of genetic effects, as well as what causes developmental trajectories to change or stop changing. In statistics, fGWAS displays increased power for gene detection by capitalizing on cumulative phenotypic variation in a longitudinal trait over time and increased robustness for manipulating sparse longitudinal data.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Feminino , Humanos , Masculino , Fenótipo
16.
J Theor Biol ; 289: 206-16, 2011 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-21871898

RESUMO

All biological phenomena occurring at different levels of organization from cells to organisms can be modeled as a dynamic system, in which the underlying components interact dynamically to comprehend its biological function. Such a systems modeling approach facilitates the use of biochemically and biophysically detailed mathematical models to describe and quantify "living cells," leading to an in-depth and precise understanding of the behavior, development and function of a biological system. Here, we illustrate how this approach can be used to map genes or quantitative trait loci (QTLs) that control a complex trait using the example of the circadian rhythm system which has been at the forefront of analytical mathematical modeling for many years. We integrate a system of biologically meaningful delay differential equations (DDEs) into functional mapping, a statistical model designed to map dynamic QTLs involved in biological processes. The DDEs model the ability of circadian rhythm to generate autonomously sustained oscillations with a period close to 24h, in terms of time-varying mRNA and protein abundances. By incorporating the Runge-Kutta fourth order algorithm within the likelihood-based context of functional mapping, we estimated the genetic parameters that define the periodic pattern of QTL effects on time-varying mRNA and protein abundances and their dynamic association as well as the linkage disequilibrium of the QTL and a marker. We prove theorems about how to choose appropriate parameters to guarantee periodic oscillations. We further used simulation studies to investigate how a QTL influences the period and the amplitude of circadian oscillations through changing model parameters. The model provides a quantitative framework for assessing the interplay between genetic effects of QTLs and rhythmic responses.


Assuntos
Mapeamento Cromossômico/métodos , Ritmo Circadiano/genética , Modelos Genéticos , Locos de Características Quantitativas/genética , Algoritmos , Relógios Biológicos/genética , Ritmo Circadiano/fisiologia , Humanos , Desequilíbrio de Ligação , Fenótipo , RNA Mensageiro/genética , Biologia de Sistemas/métodos
17.
Genes (Basel) ; 12(5)2021 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-34068248

RESUMO

Despite the fact that imbalance between case and control groups is prevalent in genome-wide association studies (GWAS), it is often overlooked. This imbalance is getting more significant and urgent as the rapid growth of biobanks and electronic health records have enabled the collection of thousands of phenotypes from large cohorts, in particular for diseases with low prevalence. The unbalanced binary traits pose serious challenges to traditional statistical methods in terms of both genomic selection and disease prediction. For example, the well-established linear mixed models (LMM) yield inflated type I error rates in the presence of unbalanced case-control ratios. In this article, we review multiple statistical approaches that have been developed to overcome the inaccuracy caused by the unbalanced case-control ratio, with the advantages and limitations of each approach commented. In addition, we also explore the potential for applying several powerful and popular state-of-the-art machine-learning approaches, which have not been applied to the GWAS field yet. This review paves the way for better analysis and understanding of the unbalanced case-control disease data in GWAS.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Estudos de Casos e Controles , Genoma/genética , Genômica/métodos , Humanos , Modelos Lineares , Aprendizado de Máquina , Fenótipo
18.
Sci Rep ; 11(1): 24159, 2021 12 17.
Artigo em Inglês | MEDLINE | ID: mdl-34921167

RESUMO

The rapid advancement of functional data in various application fields has increased the demand for advanced statistical approaches that can incorporate complex structures and nonlinear associations. In this article, we propose a novel functional random forests (FunFor) approach to model the functional data response that is densely and regularly measured, as an extension of the landmark work of Breiman, who introduced traditional random forests for a univariate response. The FunFor approach is able to predict curve responses for new observations and selects important variables from a large set of scalar predictors. The FunFor approach inherits the efficiency of the traditional random forest approach in detecting complex relationships, including nonlinear and high-order interactions. Additionally, it is a non-parametric approach without the imposition of parametric and distributional assumptions. Eight simulation settings and one real-data analysis consistently demonstrate the excellent performance of the FunFor approach in various scenarios. In particular, FunFor successfully ranks the true predictors as the most important variables, while achieving the most robust variable sections and the smallest prediction errors when comparing it with three other relevant approaches. Although motivated by a biological leaf shape data analysis, the proposed FunFor approach has great potential to be widely applied in various fields due to its minimal requirement on tuning parameters and its distribution-free and model-free nature. An R package named 'FunFor', implementing the FunFor approach, is available at GitHub.

19.
Theor Biol Med Model ; 7: 28, 2010 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-20594352

RESUMO

BACKGROUND: Living things come in all shapes and sizes, from bacteria, plants, and animals to humans. Knowledge about the genetic mechanisms for biological shape has far-reaching implications for a range spectrum of scientific disciplines including anthropology, agriculture, developmental biology, evolution and biomedicine. RESULTS: We derived a statistical model for mapping specific genes or quantitative trait loci (QTLs) that control morphological shape. The model was formulated within the mixture framework, in which different types of shape are thought to result from genotypic discrepancies at a QTL. The EM algorithm was implemented to estimate QTL genotype-specific shapes based on a shape correspondence analysis. Computer simulation was used to investigate the statistical property of the model. CONCLUSION: By identifying specific QTLs for morphological shape, the model developed will help to ask, disseminate and address many major integrative biological and genetic questions and challenges in the genetic control of biological shape and function.


Assuntos
Cucurbita/anatomia & histologia , Modelos Estatísticos , Folhas de Planta/anatomia & histologia , Mapeamento Cromossômico , Simulação por Computador , Cruzamentos Genéticos , Cucurbita/genética , Genótipo , Folhas de Planta/genética , Locos de Características Quantitativas/genética
20.
Zhongguo Zhong Yao Za Zhi ; 35(19): 2533-7, 2010 Oct.
Artigo em Zh | MEDLINE | ID: mdl-21174759

RESUMO

OBJECTIVE: To compare pharmacognostic characteristics and microscopic characteristics of Radix Paeoniae Rubra (chishao) from different areas. METHOD: Pharmacognostic characteristics and microscopic characteristics of Radix Paeoniae Rubra were compared by microscope count methods. RESULT: Chishao in duolun was more straighter and longer, cortex with a set of closely spaced rill, peel off easily, pink section, etc. The wild chishao were different from the cultivated chishao on pharmacognostic characteristics and microscopic characteristics, such as appearance shape, smell, vessel arrangement, and number of crystal and starch in unit area. CONCLUSION: Chishao in duolun were different form others, appearance shape, wood fiber, difference of appearance shape, vessel arrangement, and number of crystal and starch in unit area can be used as identificatin feature of the wild chishao and the cultivated chishao.


Assuntos
Araceae/química , Benzoatos/análise , Medicamentos de Ervas Chinesas/análise , Paeonia/anatomia & histologia , Hidrocarbonetos Aromáticos com Pontes , Cromatografia Líquida de Alta Pressão , Glucosídeos/metabolismo , Paeonia/ultraestrutura , Raízes de Plantas/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA