RESUMO
MOTIVATION: Computer inference of biological mechanisms is increasingly approachable due to dynamically rich data sources such as single-cell genomics. Inferred molecular interactions can prioritize hypotheses for wet-lab experiments to expedite biological discovery. However, complex data often come with unwanted biological or technical variations, exposing biases over marginal distribution and sample size in current methods to favor spurious causal relationships. RESULTS: Considering function direction and strength as evidence for causality, we present an adapted functional chi-squared test (AdpFunChisq) that rewards functional patterns over non-functional or independent patterns. On synthetic and three biology datasets, we demonstrate the advantages of AdpFunChisq over 10 methods on overcoming biases that give rise to wide fluctuations in the performance of alternative approaches. On single-cell multiomics data of multiple phenotype acute leukemia, we found that the T-cell surface glycoprotein CD3 delta chain may causally mediate specific genes in the viral carcinogenesis pathway. Using the causality-by-functionality principle, AdpFunChisq offers a viable option for robust causal inference in dynamical systems. AVAILABILITY AND IMPLEMENTATION: The AdpFunChisq test is implemented in the R package 'FunChisq' (2.5.2 or above) at https://cran.r-project.org/package=FunChisq. All other source code along with pre-processed data is available at Code Ocean https://doi.org/10.24433/CO.2907738.v1. SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online.
Assuntos
Genômica , Software , Viés , CausalidadeRESUMO
The complexity of biological processes such as cell differentiation is reflected in dynamic transitions between cellular states. Trajectory inference arranges the states into a progression using methodologies propelled by single-cell biology. However, current methods, all returning a best trajectory, do not adequately assess statistical significance of noisy patterns, leading to uncertainty in inferred trajectories. We introduce a tree dimension test for trajectory presence in multivariate data by a dimension measure of Euclidean minimum spanning tree, a test statistic, and a null distribution. Computable in linear time to tree size, the tree dimension measure summarizes the extent of branching more effectively than globally insensitive number of leaves or tree diameter indifferent to secondary branches. The test statistic quantifies trajectory presence and its null distribution is estimated under the null hypothesis of no trajectory in data. On simulated and real single-cell datasets, the test outperformed the intuitive number of leaves and tree diameter statistics. Next, we developed a measure for the tissue specificity of the dynamics of a subset, based on the minimum subtree cover of the subset in a minimum spanning tree. We found that tissue specificity of pathway gene expression dynamics is conserved in human and mouse development: several signal transduction pathways including calcium and Wnt signaling are most tissue specific, while genetic information processing pathways such as ribosome and mismatch repair are least so. Neither the tree dimension test nor the subset specificity measure has any user parameter to tune. Our work opens a window to prioritize cellular dynamics and pathways in development and other multivariate dynamical systems.
Assuntos
Diferenciação Celular , Animais , Diferenciação Celular/genética , Humanos , Camundongos , FilogeniaRESUMO
BACKGROUND: Cells progressing from an early state to a developed state give rise to lineages in cell differentiation. Knowledge of these lineages is central to developmental biology. Each biological lineage corresponds to a trajectory in a dynamical system. Emerging single-cell technologies such as single-cell RNA sequencing can capture molecular abundance in diverse cell types in a developing tissue. Many computational methods have been developed to infer trajectories from single-cell data. However, to our knowledge, none of the existing methods address the problem of determining the existence of a trajectory in observed data before attempting trajectory inference. RESULTS: We introduce a method to identify the existence of a trajectory using three graph-based statistics. A permutation test is utilized to calculate the empirical distribution of the test statistic under the null hypothesis that a trajectory does not exist. Finally, a p-value is calculated to quantify the statistical significance for the presence of trajectory in the data. CONCLUSIONS: Our work contributes new statistics to assess the level of uncertainty in trajectory inference to increase the understanding of biological system dynamics.
Assuntos
Análise de Célula Única , Diferenciação CelularRESUMO
Countering prior beliefs that epistasis is rare, genomics advancements suggest the other way. Current practice often filters out genomic loci with low variant counts before detecting epistasis. We argue that this practice is far from optimal because it can throw away strong epistatic patterns. Instead, we present the compensated Sharma-Song test to infer genetic epistasis in genome-wide association studies by differential departure from independence. The test does not require a minimum number of replicates for each variant. We also introduce algorithms to simulate epistatic patterns that differentially depart from independence. Using two simulators, the test performed comparably to the original Sharma-Song test when variant frequencies at a locus are marginally uniform; encouragingly, it has a marked advantage over alternatives when variant frequencies are marginally nonuniform. The test further revealed uniquely clean epistatic variants associated with chicken abdominal fat content that are not prioritized by other methods. Genes involved in most numbers of inferred epistasis between single nucleotide polymorphisms (SNPs) belong to pathways known for obesity regulation; many top SNPs are located on chromosome 20 and in intergenic regions. Measuring differential departure from independence, the compensated Sharma-Song test offers a practical choice for studying epistasis robust to nonuniform genetic variant frequencies.
Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Genoma , Genômica/métodos , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
MOTIVATION: Genetic or epigenetic events can rewire molecular networks to induce extraordinary phenotypical divergences. Among the many network rewiring approaches, no model-free statistical methods can differentiate gene-gene pattern changes not attributed to marginal changes. This may obscure fundamental rewiring from superficial changes. RESULTS: Here we introduce a model-free Sharma-Song test to determine if patterns differ in the second order, meaning that the deviation of the joint distribution from the product of marginal distributions is unequal across conditions. We prove an asymptotic chi-squared null distribution for the test statistic. Simulation studies demonstrate its advantage over alternative methods in detecting second-order differential patterns. Applying the test on three independent mammalian developmental transcriptome datasets, we report a lower frequency of co-expression network rewiring between human and mouse for the same tissue group than the frequency of rewiring between tissue groups within the same species. We also find second-order differential patterns between microRNA promoters and genes contrasting cerebellum and liver development in mice. These patterns are enriched in the spliceosome pathway regulating tissue specificity. Complementary to previous mammalian comparative studies mostly driven by first-order effects, our findings contribute an understanding of system-wide second-order gene network rewiring within and across mammalian systems. Second-order differential patterns constitute evidence for fundamentally rewired biological circuitry due to evolution, environment or disease. AVAILABILITY AND IMPLEMENTATION: The generic Sharma-Song test is available from the R package 'DiffXTables' at https://cran.r-project.org/package=DiffXTables. Other code and data are described in Section 2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
Cellulose synthases (CesAs) are multi-subunit enzymes found on the plasma membrane of plant cells and play a pivotal role in cellulose production. The cotton fiber is mainly composed of cellulose, and the genetic relationships between CesA genes and cotton fiber yield and quality are not fully understood. Through a phylogenetic analysis, the CesA gene family in diploid Gossypium arboreum and Gossypium raimondii, as well as tetraploid Gossypium hirsutum ('TM-1') and Gossypium barbadense ('Hai-7124' and '3-79'), was divided into 6 groups and 15 sub-groups, with each group containing two to five homologous genes. Most CesA genes in the four species are highly collinear. Among the five cotton genomes, 440 and 1929 single nucleotide polymorphisms (SNPs) in the CesA gene family were identified in exons and introns, respectively, including 174 SNPs resulting in amino acid changes. In total, 484 homeologous SNPs between the A and D genomes were identified in diploids, while 142 SNPs were detected between the two tetraploids, with 32 and 82 SNPs existing within G. hirsutum and G. barbadense, respectively. Additionally, 74 quantitative trait loci near 18 GhCesA genes were associated with fiber quality. One to four GhCesA genes were differentially expressed (DE) in ovules at 0 and 3 days post anthesis (DPA) between two backcross inbred lines having different fiber lengths, but no DE genes were identified between these lines in developing fibers at 10 DPA. Twenty-seven SNPs in above DE CesA genes were detected among seven cotton lines, including one SNP in Ghi_A08G03061 that was detected in four G. hirsutum genotypes. This study provides the first comprehensive characterization of the cotton CesA gene family, which may play important roles in determining cotton fiber quality.
Assuntos
Glucosiltransferases/genética , Gossypium/crescimento & desenvolvimento , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Mapeamento Cromossômico , Fibra de Algodão , Diploide , Regulação da Expressão Gênica de Plantas , Genótipo , Gossypium/classificação , Gossypium/genética , Família Multigênica , Filogenia , Melhoramento Vegetal , Proteínas de Plantas/genética , PoliploidiaRESUMO
MOTIVATION: Chromosomal patterning of gene expression in cancer can arise from aneuploidy, genome disorganization or abnormal DNA methylation. To map such patterns, we introduce a weighted univariate clustering algorithm to guarantee linear runtime, optimality and reproducibility. RESULTS: We present the chromosome clustering method, establish its optimality and runtime and evaluate its performance. It uses dynamic programming enhanced with an algorithm to reduce search-space in-place to decrease runtime overhead. Using the method, we delineated outstanding genomic zones in 17 human cancer types. We identified strong continuity in dysregulation polarity-dominance by either up- or downregulated genes in a zone-along chromosomes in all cancer types. Significantly polarized dysregulation zones specific to cancer types are found, offering potential diagnostic biomarkers. Unreported previously, a total of 109 loci with conserved dysregulation polarity across cancer types give insights into pan-cancer mechanisms. Efficient chromosomal clustering opens a window to characterize molecular patterns in cancer genome and beyond. AVAILABILITY AND IMPLEMENTATION: Weighted univariate clustering algorithms are implemented within the R package 'Ckmeans.1d.dp' (4.0.0 or above), freely available at https://cran.r-project.org/package=Ckmeans.1d.dp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Neoplasias , Software , Algoritmos , Análise por Conglomerados , Genômica , Humanos , Neoplasias/genética , Reprodutibilidade dos TestesRESUMO
Cotton is the most important natural fiber used in textiles. Breeding for "three-lines", i.e., cytoplasmic male sterility (CMS)-based sterile (A), maintainer (B), and restorer (R) line, is a promising approach to harness hybrid vigor in cotton. Pentatricopeptide repeat (PPR) protein-encoding genes play an important role in plant growth and development including restoration of CMS plants to male fertility. However, PPRs, especially those contributing to CMS and fiber development, remain largely unknown in cotton. In this study, a genome-wide identification and characterization of PPR gene family in four Gossypium species with genome sequences (G. arboreum, G. raimondii, G. hirsutum, and G. barbadense) were performed, and expressed PPR genes in developing floral buds, ovules, and fibers were compared to identify possible PPRs related to CMS restoration and fiber development. A total of 539, 558, 1032, and 1055 PPRs were predicted in the above four species, respectively, which were further mapped to chromosomes for a synteny analysis. Through an RNA-seq analysis, 86% (882) PPRs were expressed in flowering buds of upland cotton (G. hirsutum); however, only 11 and 6 were differentially expressed (DE) between restorer R and its near-isogenic (NI) B and between R and its NI A line, respectively. Another RNA-seq analysis identified the expression of only 54% (556) PPRs in 0 and 3 day(s) post-anthesis (DPA) ovules and 24% (247) PPRs in 10 DPA fibers; however, only 59, 6, and 27 PPRs were DE in 0 and 3 DPA ovules, and 10 DPA fibers between two backcross inbred lines (BILs) with differing fiber length, respectively. Only 2 PPRs were DE between Xuzhou 142 and its fiberless and fuzzless mutant. Quantitative RT-PCR analysis confirmed the validity of the RNA-seq results for the gene expression pattern. Therefore, only a very small number of PPRs may be associated with fertility restoration of CMS and genetic differences in fiber initiation and elongation. These results lay a foundation for understanding the roles of PPR genes in cotton, and will be useful in the prioritization of candidate PPR gene functional validation for cotton CMS restoration and fiber development.
Assuntos
Proteínas de Arabidopsis/genética , Flores/genética , Regulação da Expressão Gênica de Plantas/genética , Gossypium/genética , Óvulo Vegetal/genética , Proteínas de Plantas/genética , Mapeamento Cromossômico/métodos , Fibra de Algodão , Perfilação da Expressão Gênica/métodos , Estudo de Associação Genômica Ampla/métodos , Sintenia/genéticaRESUMO
It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.
Assuntos
Causalidade , Redes Reguladoras de Genes , Neoplasias/genética , Mapeamento de Interação de Proteínas/métodos , Software , Biologia de Sistemas , Algoritmos , Biologia Computacional , Simulação por Computador , Perfilação da Expressão Gênica , Humanos , Modelos Biológicos , Transdução de Sinais , Células Tumorais CultivadasRESUMO
The number and location of mapped quantitative trait loci (QTL) depend on genetic populations and testing environments. The identification of consistent QTL across genetic backgrounds and environments is a pre-requisite to marker-assisted selection. This study analyzed a total of 661 abiotic and biotic stress resistance QTL based on our previous work and other publications using the meta-analysis software Biomercator. It identified chromosomal regions containing QTL clusters for different resistance traits and hotspots for a particular resistance trait in cotton from 98 QTL for drought tolerance under greenhouse (DT) and 150 QTL in field conditions (FDT), 80 QTL for salt tolerance in the greenhouse conditions (ST), 201 QTL for resistance to Verticillium wilt (VW, Verticillium dahliae), 47 QTL for resistance to Fusarium wilt (FW, Fusarium oxysporum f. sp. vasinfectum), and 85 QTL for resistance to root-knot nematodes (RKN, Meloiodogyne incognita) and reniform nematodes (RN, Rotylenchulus reniformis). The traits used in QTL mapping for abiotic stress tolerance included morphological traits-plant height and fresh and dry shoot and root weights, physiological traits-chlorophyll content, osmotic potential, carbon isotope ratio, stomatal conductance, photosynthetic rate, transpiration, canopy temperature, and leaf area index, agronomic traits-seedcotton yield, lint yield, boll weight, and lint percent, and fiber quality traits-fiber length, uniformity, strength, elongation, and micronaire. The results showed that resistance QTL are not uniformly distributed across the cotton genome; some chromosomes carried disproportionally more QTL, QTL clusters, or hotspots. Twenty-three QTL clusters were found on 15 chromosomes (c3, c4, c5, c6, c7, c11, c14, c15, c16, c19, c20, c23, c24, c25, and c26). Moreover, 28 QTL hotshots were associated with different resistance traits including one hotspot on c4 for Verticillium wilt resistance, two QTL hotspots on c24 for chlorophyll content measured under both drought and salt stress conditions, and three other hotspots on c19 for the resistance to Verticillium wilt and Fusarium wilt, and micronaire under drought stress conditions. This meta-analysis of stress tolerance QTL provides an important foundation for cotton breeding and further studies on the genetic mechanisms of abiotic and biotic stress resistance in cotton.
Assuntos
Gossypium/genética , Locos de Características Quantitativas , Estresse Fisiológico/genética , Tetraploidia , Cromossomos de Plantas , Gossypium/fisiologiaRESUMO
Analysis of rewired upstream subnetworks impacting downstream differential gene expression aids the delineation of evolving molecular mechanisms. Cumulative statistics based on conventional differential correlation are limited for subnetwork rewiring analysis since rewiring is not necessarily equivalent to change in correlation coefficients. Here we present a computational method ChiNet to quantify subnetwork rewiring by statistical heterogeneity that enables detection of potential genotype changes causing altered transcription regulation in evolving organisms. Given a differentially expressed downstream gene set, ChiNet backtracks a rewired upstream subnetwork from a super-network including gene interactions known to occur under various molecular contexts. We benchmarked ChiNet for its high accuracy in distinguishing rewired artificial subnetworks, in silico yeast transcription-metabolic subnetworks, and rewired transcription subnetworks for Candida albicans versus Saccharomyces cerevisiae, against two differential-correlation based subnetwork rewiring approaches. Then, using transcriptome data from tolerant S. cerevisiae strain NRRL Y-50049 and a wild-type intolerant strain, ChiNet identified 44 metabolic pathways affected by rewired transcription subnetworks anchored to major adaptively activated transcription factor genes YAP1, RPN4, SFP1 and ROX1, in response to toxic chemical challenges involved in lignocellulose-to-biofuels conversion. These findings support the use of ChiNet in rewiring analysis of subnetworks where differential interaction patterns resulting from divergent nonlinear dynamics abound.
Assuntos
Biocombustíveis , Candida albicans/genética , Redes Reguladoras de Genes , Saccharomyces cerevisiae/genética , Transcrição Gênica , Candida albicans/metabolismo , Distribuição de Qui-Quadrado , Biologia Computacional/métodos , Simulação por Computador , Redes e Vias Metabólicas/genética , Via de Pentose Fosfato/genética , Saccharomyces cerevisiae/metabolismoRESUMO
The mammalian CNS is one of the most complex biological systems to understand at the molecular level. The temporal information from time series transcriptome analysis can serve as a potent source of associative information between developmental processes and regulatory genes. Here, we introduce a new transcriptome database called, Cerebellar Gene Regulation in Time and Space (CbGRiTS). This dataset is populated with transcriptome data across embryonic and postnatal development from two standard mouse strains, C57BL/6J and DBA/2J, several recombinant inbred lines and cerebellar mutant strains. Users can evaluate expression profiles across cerebellar development in a deep time series with graphical interfaces for data exploration and link-out to anatomical expression databases. We present three analytical approaches that take advantage of specific aspects of the time series for transcriptome analysis. We demonstrate the use of CbGRiTS dataset as a community resource to explore patterns of gene expression and develop hypotheses concerning gene regulatory networks in brain development.
Assuntos
Cerebelo/embriologia , Cerebelo/fisiologia , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Algoritmos , Animais , Análise por Conglomerados , Biologia Computacional , Bases de Dados Genéticas , Feminino , Perfilação da Expressão Gênica , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos DBA , Análise de Sequência com Séries de Oligonucleotídeos , Software , Especificidade da Espécie , Fatores de Tempo , TranscriptomaRESUMO
Heterogeneity in genetic networks across different signaling molecular contexts can suggest molecular regulatory mechanisms. Here we describe a comparative chi-square analysis (CPχ(2)) method, considerably more flexible and effective than other alternatives, to screen large gene expression data sets for conserved and differential interactions. CPχ(2) decomposes interactions across conditions to assess homogeneity and heterogeneity. Theoretically, we prove an asymptotic chi-square null distribution for the interaction heterogeneity statistic. Empirically, on synthetic yeast cell cycle data, CPχ(2) achieved much higher statistical power in detecting differential networks than alternative approaches. We applied CPχ(2) to Drosophila melanogaster wing gene expression arrays collected under normal conditions, and conditions with overexpressed E2F and Cabut, two transcription factor complexes that promote ectopic cell cycling. The resulting differential networks suggest a mechanism by which E2F and Cabut regulate distinct gene interactions, while still sharing a small core network. Thus, CPχ(2) is sensitive in detecting network rewiring, useful in comparing related biological systems.
Assuntos
Redes Reguladoras de Genes , Animais , Ciclo Celular/genética , Distribuição de Qui-Quadrado , Proteínas de Drosophila/fisiologia , Drosophila melanogaster/genética , Fatores de Transcrição E2F/fisiologia , Perfilação da Expressão Gênica , Fatores de Transcrição/fisiologia , Leveduras/genéticaRESUMO
BACKGROUND: Verticillium wilt (VW) and Fusarium wilt (FW), caused by the soil-borne fungi Verticillium dahliae and Fusarium oxysporum f. sp. vasinfectum, respectively, are two most destructive diseases in cotton production worldwide. Root-knot nematodes (Meloidogyne incognita, RKN) and reniform nematodes (Rotylenchulus reniformis, RN) cause the highest yield loss in the U.S. Planting disease resistant cultivars is the most cost effective control method. Numerous studies have reported mapping of quantitative trait loci (QTLs) for disease resistance in cotton; however, very few reliable QTLs were identified for use in genomic research and breeding. RESULTS: This study first performed a 4-year replicated test of a backcross inbred line (BIL) population for VW resistance, and 10 resistance QTLs were mapped based on a 2895 cM linkage map with 392 SSR markers. The 10 VW QTLs were then placed to a consensus linkage map with other 182 VW QTLs, 75 RKN QTLs, 27 FW QTLs, and 7 RN QTLs reported from 32 publications. A meta-analysis of QTLs identified 28 QTL clusters including 13, 8 and 3 QTL hotspots for resistance to VW, RKN and FW, respectively. The number of QTLs and QTL clusters on chromosomes especially in the A-subgenome was significantly correlated with the number of nucleotide-binding site (NBS) genes, and the distribution of QTLs between homeologous A- and D- subgenome chromosomes was also significantly correlated. CONCLUSIONS: Ten VW resistance QTL identified in a 4-year replicated study have added useful information to the understanding of the genetic basis of VW resistance in cotton. Twenty-eight disease resistance QTL clusters and 24 hotspots identified from a total of 306 QTLs and linked SSR markers provide important information for marker-assisted selection and high resolution mapping of resistance QTLs and genes. The non-overlapping of most resistance QTL hotspots for different diseases indicates that their resistances are controlled by different genes.
Assuntos
Cruzamentos Genéticos , Resistência à Doença/genética , Gossypium/genética , Gossypium/microbiologia , Doenças das Plantas/genética , Doenças das Plantas/microbiologia , Locos de Características Quantitativas , Verticillium , Mapeamento Cromossômico , Análise por Conglomerados , Genes de Plantas , Marcadores Genéticos , EndogamiaRESUMO
KEY MESSAGE: A specialized database currently containing more than 2200 QTL is established, which allows graphic presentation, visualization and submission of QTL. In cotton quantitative trait loci (QTL), studies are focused on intraspecific Gossypium hirsutum and interspecific G. hirsutum × G. barbadense populations. These two populations are commercially important for the textile industry and are evaluated for fiber quality, yield, seed quality, resistance, physiological, and morphological trait QTL. With meta-analysis data based on the vast amount of QTL studies in cotton it will be beneficial to organize the data into a functional database for the cotton community. Here we provide a tool for cotton researchers to visualize previously identified QTL and submit their own QTL to the Cotton QTLdb database. The database provides the user with the option of selecting various QTL trait types from either the G. hirsutum or G. hirsutum × G. barbadense populations. Based on the user's QTL trait selection, graphical representations of chromosomes of the population selected are displayed in publication ready images. The database also provides users with trait information on QTL, LOD scores, and explained phenotypic variances for all QTL selected. The CottonQTLdb database provides cotton geneticist and breeders with statistical data on cotton QTL previously identified and provides a visualization tool to view QTL positions on chromosomes. Currently the database (Release 1) contains 2274 QTLs, and succeeding QTL studies will be updated regularly by the curators and members of the cotton community that contribute their data to keep the database current. The database is accessible from http://www.cottonqtldb.org.
Assuntos
Bases de Dados Genéticas , Genes de Plantas/genética , Gossypium/genética , Locos de Características Quantitativas/genética , Mapeamento Cromossômico , Cromossomos de Plantas/genética , Biologia Computacional/métodos , Cruzamentos Genéticos , Internet , Escore Lod , FenótipoRESUMO
KEY MESSAGE: Based on 1075 and 1059 QTL from intraspecific Upland and interspecific Upland × Pima populations, respectively, the identification of QTL clusters and hotspots provides a useful resource for cotton breeding. Mapping of quantitative trait loci (QTL) is a pre-requisite of marker-assisted selection for crop yield and quality. Recent meta-analysis of QTL in tetraploid cotton (Gossypium spp.) has identified regions of the genome with high concentrations of QTL for various traits called clusters and specific trait QTL called hotspots or meta-QTL (mQTL). However, the meta-analysis included all population types of Gossypium mixing both intraspecific G. hirsutum and interspecific G. hirsutum × G. barbadense populations. This study used 1,075 QTL from 58 publications on intraspecific G. hirsutum and 1,059 QTL from 30 publications on G. hirsutum × G. barbadense populations to perform a comprehensive comparative analysis of QTL clusters and hotspots between the two populations for yield, fiber and seed quality, and biotic and abiotic stress tolerance. QTL hotspots were further analyzed for mQTL within the hotspots using Biomercator V3 software. The ratio of QTL between the two population types was proportional yet differences in hotspot type and placement were observed between the two population types. However, on some chromosomes QTL clusters and hotspots were similar between the two populations. This shows that there are some universal QTL regions in the cultivated tetraploid cotton which remain consistent and some regions which differ between population types. This study for the first time elucidates the similarities and differences in QTL clusters and hotspots between intraspecific and interspecific populations, providing an important resource to cotton breeding programs in marker-assisted selection .
Assuntos
Cromossomos de Plantas/genética , Genoma de Planta/genética , Gossypium/genética , Locos de Características Quantitativas/genética , Cruzamento , Mapeamento Cromossômico , Análise por Conglomerados , Fenótipo , Especificidade da Espécie , TetraploidiaRESUMO
BACKGROUND: The study of quantitative trait loci (QTL) in cotton (Gossypium spp.) is focused on traits of agricultural significance. Previous studies have identified a plethora of QTL attributed to fiber quality, disease and pest resistance, branch number, seed quality and yield and yield related traits, drought tolerance, and morphological traits. However, results among these studies differed due to the use of different genetic populations, markers and marker densities, and testing environments. Since two previous meta-QTL analyses were performed on fiber traits, a number of papers on QTL mapping of fiber quality, yield traits, morphological traits, and disease resistance have been published. To obtain a better insight into the genome-wide distribution of QTL and to identify consistent QTL for marker assisted breeding in cotton, an updated comparative QTL analysis is needed. RESULTS: In this study, a total of 1,223 QTL from 42 different QTL studies in Gossypium were surveyed and mapped using Biomercator V3 based on the Gossypium consensus map from the Cotton Marker Database. A meta-analysis was first performed using manual inference and confirmed by Biomercator V3 to identify possible QTL clusters and hotspots. QTL clusters are composed of QTL of various traits which are concentrated in a specific region on a chromosome, whereas hotspots are composed of only one trait type. QTL were not evenly distributed along the cotton genome and were concentrated in specific regions on each chromosome. QTL hotspots for fiber quality traits were found in the same regions as the clusters, indicating that clusters may also form hotspots. CONCLUSIONS: Putative QTL clusters were identified via meta-analysis and will be useful for breeding programs and future studies involving Gossypium QTL. The presence of QTL clusters and hotspots indicates consensus regions across cultivated tetraploid Gossypium species, environments, and populations which contain large numbers of QTL, and in some cases multiple QTL associated with the same trait termed a hotspot. This study combines two previous meta-analysis studies and adds all other currently available QTL studies, making it the most comprehensive meta-analysis study in cotton to date.
Assuntos
Fibra de Algodão , Resistência à Doença/genética , Gossypium/genética , Locos de Características Quantitativas/genética , Mapeamento Cromossômico , Cruzamentos Genéticos , Secas , Variação Genética , Gossypium/crescimento & desenvolvimento , Tetraploidia , TêxteisRESUMO
Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for choosing this number on linear data are inapplicable to circular data. To fill this gap, we introduce the circular silhouette to measure cluster quality and a fast algorithm to calculate the average silhouette width. The algorithm runs in linear time to the number of points on sorted data, instead of quadratic time by the silhouette definition. Empirically, it is over 3000 times faster than by silhouette definition on 1,000,000 circular data points in five clusters. On simulated datasets, the algorithm returned correct numbers of clusters. We identified clusters on round genomes of human mitochondria and bacteria. On sunspot activity data, we found changed solar-cycle patterns over the past two centuries. Using the circular silhouette not only eliminates the subjective selection of number of clusters, but is also scalable to big circular and periodic data abundant in science, engineering, and medicine.
RESUMO
In perennial plants such as pecan, once reproductive maturity is attained, there are genetic switches that are regulated and required for flower development year after year. Pecan trees are heterodichogamous with both pistillate and staminate flowers produced on the same tree. Therefore, defining genes exclusively responsible for pistillate inflorescence and staminate inflorescence (catkin) initiation is challenging at best. To understand these genetic switches and their timing, this study analyzed catkin bloom and gene expression of lateral buds collected from a protogynous (Wichita) and a protandrous (Western) pecan cultivar in summer, autumn and spring. Our data showed that pistillate flowers in the current season on the same shoot negatively impacted catkin production on the protogynous 'Wichita' cultivar. Whereas fruit production the previous year on 'Wichita' had a positive effect on catkin production on the same shoot the following year. However, fruiting the previous year nor current year pistillate flower production had no significant effect on catkin production on 'Western' (protandrous cultivar) cultivar. The RNA-Seq results present more significant differences between the fruiting and non-fruiting shoots of the 'Wichita' cultivar compared to the 'Western' cultivar, revealing the genetic signals likely responsible for catkin production. Our data presented here, indicates the genes showing expression for the initiation of both types of flowers the season before bloom.
Assuntos
Carya , Carya/genética , Cone de Plantas , Flores/genética , Frutas , Perfilação da Expressão GênicaRESUMO
MOTIVATION: While biological systems operated from a common genome can be conserved in various ways, they can also manifest highly diverse dynamics and functions. This is because the same set of genes can interact differentially across specific molecular contexts. For example, differential gene interactions give rise to various stages of morphogenesis during cerebellar development. However, after over a decade of efforts toward reverse engineering biological networks from high-throughput omic data, gene networks of most organisms remain sketchy. This hindrance has motivated us to develop comparative modeling to highlight conserved and differential gene interactions across experimental conditions, without reconstructing complete gene networks first. RESULTS: We established a comparative dynamical system modeling (CDSM) approach to identify conserved and differential interactions across molecular contexts. In CDSM, interactions are represented by ordinary differential equations and compared across conditions through statistical heterogeneity and homogeneity tests. CDSM demonstrated a consistent superiority over differential correlation and reconstruct-then-compare in simulation studies. We exploited CDSM to elucidate gene interactions important for cellular processes poorly understood during mouse cerebellar development. We generated hypotheses on 66 differential genetic interactions involved in expansion of the external granule layer. These interactions are implicated in cell cycle, differentiation, apoptosis and morphogenesis. Additional 1639 differential interactions among gene clusters were also identified when we compared gene interactions during the presence of Rhombic lip versus the presence of distinct internal granule layer. Moreover, compared with differential correlation and reconstruct-then-compare, CDSM makes fewer assumptions on data and thus is applicable to a wider range of biological assays. AVAILABILITY: Source code in C++ and R is available for non-commercial organizations upon request from the corresponding author. The cerebellum gene expression dataset used in this article is available upon request from the Goldowitz lab (dang@cmmt.ubc.ca, http://grits.dglab.org/). CONTACT: joemsong@cs.nmsu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.