Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
1.
NAR Genom Bioinform ; 5(4): lqad098, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37954572

ABSTRACT

To fully understand gene regulation, it is necessary to have a thorough understanding of both the transcriptome and the enzymatic and RNA-binding activities that shape it. While many RNA-Seq-based tools have been developed to analyze the transcriptome, most only consider the abundance of sequencing reads along annotated patterns (such as genes). These annotations are typically incomplete, leading to errors in the differential expression analysis. To address this issue, we present DiffSegR - an R package that enables the discovery of transcriptome-wide expression differences between two biological conditions using RNA-Seq data. DiffSegR does not require prior annotation and uses a multiple changepoints detection algorithm to identify the boundaries of differentially expressed regions in the per-base log2 fold change. In a few minutes of computation, DiffSegR could rightfully predict the role of chloroplast ribonuclease Mini-III in rRNA maturation and chloroplast ribonuclease PNPase in (3'/5')-degradation of rRNA, mRNA and tRNA precursors as well as intron accumulation. We believe DiffSegR will benefit biologists working on transcriptomics as it allows access to information from a layer of the transcriptome overlooked by the classical differential expression analysis pipelines widely used today. DiffSegR is available at https://aliehrmann.github.io/DiffSegR/index.html.

2.
Front Plant Sci ; 13: 980587, 2022.
Article in English | MEDLINE | ID: mdl-36479518

ABSTRACT

Partial resistance in plants generally exerts a low selective pressure on pathogens, and thus ensuring their durability in agrosystems. However, little is known about the effect of partial resistance on the molecular mechanisms of pathogenicity, a knowledge that could advance plant breeding for sustainable plant health. Here we investigate the gene expression of Phytophthora capsici during infection of pepper (Capsicum annuum L.), where only partial genetic resistance is reported, using Illumina RNA-seq. Comparison of transcriptomes of P. capsici infecting susceptible and partially resistant peppers identified a small number of genes that redirected its own resources into lipid biosynthesis to subsist on partially resistant plants. The adapted and non-adapted isolates of P. capsici differed in expression of genes involved in nucleic acid synthesis and transporters. Transient ectopic expression of the RxLR effector genes CUST_2407 and CUST_16519 in pepper lines differing in resistance levels revealed specific host-isolate interactions that either triggered local necrotic lesions (hypersensitive response or HR) or elicited leave abscission (extreme resistance or ER), preventing the spread of the pathogen to healthy tissue. Although these effectors did not unequivocally explain the quantitative host resistance, our findings highlight the importance of plant genes limiting nutrient resources to select pepper cultivars with sustainable resistance to P. capsici.

3.
Int J Mol Sci ; 22(20)2021 Oct 19.
Article in English | MEDLINE | ID: mdl-34681956

ABSTRACT

Plastid gene expression involves many post-transcriptional maturation steps resulting in a complex transcriptome composed of multiple isoforms. Although short-read RNA-Seq has considerably improved our understanding of the molecular mechanisms controlling these processes, it is unable to sequence full-length transcripts. This information is crucial, however, when it comes to understanding the interplay between the various steps of plastid gene expression. Here, we describe a protocol to study the plastid transcriptome using nanopore sequencing. In the leaf of Arabidopsis thaliana, with about 1.5 million strand-specific reads mapped to the chloroplast genome, we could recapitulate most of the complexity of the plastid transcriptome (polygenic transcripts, multiple isoforms associated with post-transcriptional processing) using virtual Northern blots. Even if the transcripts longer than about 2500 nucleotides were missing, the study of the co-occurrence of editing and splicing events identified 42 pairs of events that were not occurring independently. This study also highlighted a preferential chronology of maturation events with splicing happening after most sites were edited.


Subject(s)
Alternative Splicing , Arabidopsis Proteins/metabolism , Arabidopsis/metabolism , Gene Expression Regulation, Plant , Plastids/genetics , RNA, Plant/genetics , Transcriptome , Arabidopsis/genetics , Arabidopsis/growth & development , Arabidopsis Proteins/genetics , Plastids/metabolism , RNA, Plant/metabolism , RNA-Seq
4.
BMC Bioinformatics ; 22(1): 323, 2021 Jun 14.
Article in English | MEDLINE | ID: mdl-34126932

ABSTRACT

BACKGROUND: Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson distribution to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them. RESULTS: Our comparisons on seven reference datasets of histone modifications (H3K36me3 & H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model with alternative noise assumptions and supervised learning of the penalty parameter reduces the over-dispersion exhibited by count data. These models, implemented in the R package CROCS ( https://github.com/aLiehrmann/CROCS ), detect the peaks more accurately than algorithms which rely on natural assumptions. CONCLUSION: The segmentation models we propose can benefit researchers in the field of epigenetics by providing new high-quality peak prediction tracks for H3K36me3 and H3K4me3 histone modifications.


Subject(s)
Chromatin Immunoprecipitation Sequencing , High-Throughput Nucleotide Sequencing , Algorithms , Chromatin Immunoprecipitation , Sequence Analysis, DNA
5.
New Phytol ; 229(2): 994-1006, 2021 01.
Article in English | MEDLINE | ID: mdl-32583438

ABSTRACT

The Anthropocene epoch is associated with the spreading of metals in the environment increasing oxidative and genotoxic stress on organisms. Interestingly, c. 520 plant species growing on metalliferous soils acquired the capacity to accumulate and tolerate a tremendous amount of nickel in their shoots. The wide phylogenetic distribution of these species suggests that nickel hyperaccumulation evolved multiple times independently. However, the exact nature of these mechanisms and whether they have been recruited convergently in distant species is not known. To address these questions, we have developed a cross-species RNA-Seq approach combining differential gene expression analysis and cluster of orthologous group annotation to identify genes linked to nickel hyperaccumulation in distant plant families. Our analysis reveals candidate orthologous genes encoding convergent function involved in nickel hyperaccumulation, including the biosynthesis of specialized metabolites and cell wall organization. Our data also point out that the high expression of IREG/Ferroportin transporters recurrently emerged as a mechanism involved in nickel hyperaccumulation in plants. We further provide genetic evidence in the hyperaccumulator Noccaea caerulescens for the role of the NcIREG2 transporter in nickel sequestration in vacuoles. Our results provide molecular tools to better understand the mechanisms of nickel hyperaccumulation and study their evolution in plants.


Subject(s)
Brassicaceae , Nickel , Brassicaceae/genetics , Phylogeny , RNA-Seq , Soil
6.
Genes (Basel) ; 13(1)2021 12 27.
Article in English | MEDLINE | ID: mdl-35052407

ABSTRACT

RNA silencing serves key roles in a multitude of cellular processes, including development, stress responses, metabolism, and maintenance of genome integrity. Dicer, Argonaute (AGO), double-stranded RNA binding (DRB) proteins, RNA-dependent RNA polymerase (RDR), and DNA-dependent RNA polymerases known as Pol IV and Pol V form core components to trigger RNA silencing. Common bean (Phaseolus vulgaris) is an important staple crop worldwide. In this study, we aimed to unravel the components of the RNA-guided silencing pathway in this non-model plant, taking advantage of the availability of two genome assemblies of Andean and Meso-American origin. We identified six PvDCLs, thirteen PvAGOs, 10 PvDRBs, 5 PvRDRs, in both genotypes, suggesting no recent gene amplification or deletion after the gene pool separation. In addition, we identified one PvNRPD1 and one PvNRPE1 encoding the largest subunits of Pol IV and Pol V, respectively. These genes were categorized into subgroups based on phylogenetic analyses. Comprehensive analyses of gene structure, genomic localization, and similarity among these genes were performed. Their expression patterns were investigated by means of expression models in different organs using online data and quantitative RT-PCR after pathogen infection. Several of the candidate genes were up-regulated after infection with the fungus Colletotrichum lindemuthianum.


Subject(s)
Colletotrichum/physiology , Gene Expression Regulation, Plant , Genome-Wide Association Study , Phaseolus/genetics , Plant Diseases/genetics , Plant Proteins/metabolism , RNA Interference , Argonaute Proteins/genetics , Argonaute Proteins/metabolism , DNA-Directed RNA Polymerases/genetics , DNA-Directed RNA Polymerases/metabolism , Phaseolus/growth & development , Phaseolus/immunology , Phaseolus/microbiology , Phylogeny , Plant Diseases/immunology , Plant Diseases/microbiology , Plant Proteins/genetics , RNA-Dependent RNA Polymerase/genetics , RNA-Dependent RNA Polymerase/metabolism , Transcriptome
7.
BMC Bioinformatics ; 21(1): 120, 2020 Mar 20.
Article in English | MEDLINE | ID: mdl-32197576

ABSTRACT

BACKGROUND: In unsupervised learning and clustering, data integration from different sources and types is a difficult question discussed in several research areas. For instance in omics analysis, dozen of clustering methods have been developed in the past decade. When a single source of data is at play, hierarchical clustering (HC) is extremely popular, as a tree structure is highly interpretable and arguably more informative than just a partition of the data. However, applying blindly HC to multiple sources of data raises computational and interpretation issues. RESULTS: We propose mergeTrees, a method that aggregates a set of trees with the same leaves to create a consensus tree. In our consensus tree, a cluster at height h contains the individuals that are in the same cluster for all the trees at height h. The method is exact and proven to be [Formula: see text], n being the individuals and q being the number of trees to aggregate. Our implementation is extremely effective on simulations, allowing us to process many large trees at a time. We also rely on mergeTrees to perform the cluster analysis of two real -omics data sets, introducing a spectral variant as an efficient and robust by-product. CONCLUSIONS: Our tree aggregation method can be used in conjunction with hierarchical clustering to perform efficient cluster analysis. This approach was found to be robust to the absence of clustering information in some of the data sets as well as an increased variability within true clusters. The method is implemented in R/C++ and available as an R package named mergeTrees, which makes it easy to integrate in existing or new pipelines in several research areas.


Subject(s)
Cluster Analysis , Algorithms , Gene Expression Profiling , Humans , Proteomics
8.
Algorithms Mol Biol ; 14: 22, 2019.
Article in English | MEDLINE | ID: mdl-31807137

ABSTRACT

BACKGROUND: Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-resolution, locus-level measurements. An intuitive way of doing this is to perform a modified Hierarchical Agglomerative Clustering (HAC), where only adjacent clusters (according to the ordering of positions within a chromosome) are allowed to be merged. But a major practical drawback of this method is its quadratic time and space complexity in the number of loci, which is typically of the order of 10 4 to 10 5 for each chromosome. RESULTS: By assuming that the similarity between physically distant objects is negligible, we are able to propose an implementation of adjacency-constrained HAC with quasi-linear complexity. This is achieved by pre-calculating specific sums of similarities, and storing candidate fusions in a min-heap. Our illustrations on GWAS and Hi-C datasets demonstrate the relevance of this assumption, and show that this method highlights biologically meaningful signals. Thanks to its small time and memory footprint, the method can be run on a standard laptop in minutes or even seconds. AVAILABILITY AND IMPLEMENTATION: Software and sample data are available as an R package, adjclust, that can be downloaded from the Comprehensive R Archive Network (CRAN).

9.
BMC Genomics ; 20(1): 634, 2019 Aug 06.
Article in English | MEDLINE | ID: mdl-31387530

ABSTRACT

BACKGROUND: The effective use of mutant populations for reverse genetic screens relies on the population-wide characterization of the induced mutations. Genome- and population-wide characterization of the mutations found in fast neutron populations has been hindered, however, by the wide range of mutations generated and the lack of affordable technologies to detect DNA sequence changes. In this study, we therefore aimed to test whether genotyping-by-sequencing (GBS) technology could be used to characterize copy number variation (CNV) induced by fast neutrons in a soybean mutant population. RESULTS: We called CNVs from GBS data in 79 soybean mutants and assessed the sensitivity and precision of this approach by validating our results against array comparative genomic hybridization (aCGH) data for 19 of these mutants as well as targeted PCR and ddPCR assays for a representative subset of the smallest events detected by GBS. Our GBS pipeline detected 55 of the 96 events found by aCGH, with approximate detection thresholds of 60 kb, 500 kb and 1 Mb for homozygous deletions, hemizygous deletions and duplications, respectively. Among the whole set of 79 mutants, the GBS data revealed 105 homozygous deletions, 32 hemizygous deletions and 19 duplications. This included several extremely large events, exhibiting maximum sizes of ~ 11.2 Mb for a homozygous deletion, ~ 11.6 Mb for a hemizygous deletion, and ~ 50 Mb for a duplication. CONCLUSIONS: This study provides a proof of concept that GBS can be used as an affordable high-throughput method for assessing CNVs in fast neutron mutants. The modularity of this GBS approach allows combining as many different libraries or sequencing runs as is necessary for reaching the goals of a particular study. This method should enable the low-cost genome-wide characterization of hundreds to thousands of individuals in fast neutron mutant populations or any population with large genomic deletions and duplications.


Subject(s)
DNA Copy Number Variations , DNA Mutational Analysis , Fast Neutrons , Genotyping Techniques , Glycine max/genetics , Mutation , Mutagenesis
10.
Cancer Med ; 8(5): 2414-2428, 2019 05.
Article in English | MEDLINE | ID: mdl-30957988

ABSTRACT

TNBC is a highly heterogeneous and aggressive breast cancer subtype associated with high relapse rates, and for which no targeted therapy yet exists. Protein arginine methyltransferase 5 (PRMT5), an enzyme which catalyzes the methylation of arginines on histone and non-histone proteins, has recently emerged as a putative target for cancer therapy. Potent and specific PRMT5 inhibitors have been developed, but the therapeutic efficacy of PRMT5 targeting in TNBC has not yet been demonstrated. Here, we examine the expression of PRMT5 in a human breast cancer cohort obtained from the Institut Curie, and evaluate the therapeutic potential of pharmacological inhibition of PRMT5 in TNBC. We find that PRMT5 mRNA and protein are expressed at comparable levels in TNBC, luminal breast tumors, and healthy mammary tissues. However, immunohistochemistry analyses reveal that PRMT5 is differentially localized in TNBC compared to other breast cancer subtypes and to normal breast tissues. PRMT5 is heterogeneously expressed in TNBC and high PRMT5 expression correlates with poor prognosis within this breast cancer subtype. Using the small-molecule inhibitor EPZ015666, we show that PRMT5 inhibition impairs cell proliferation in a subset of TNBC cell lines. PRMT5 inhibition triggers apoptosis, regulates cell cycle progression and decreases mammosphere formation. Furthermore, EPZ015666 administration to a patient-derived xenograft model of TNBC significantly deters tumor progression. Finally, we reveal potentiation between EGFR and PRMT5 targeting, suggestive of a beneficial combination therapy. Our findings highlight a distinctive subcellular localization of PRMT5 in TNBC, and uphold PRMT5 targeting, alone or in combination, as a relevant treatment strategy for a subset of TNBC.


Subject(s)
Biomarkers, Tumor , Protein-Arginine N-Methyltransferases/metabolism , Triple Negative Breast Neoplasms/metabolism , Animals , Antineoplastic Agents/pharmacology , Cell Cycle/genetics , Cell Line, Tumor , Cell Survival/drug effects , Cell Survival/genetics , Disease Models, Animal , Dose-Response Relationship, Drug , Drug Synergism , Female , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , Immunohistochemistry , Isoquinolines/pharmacology , Mice , Molecular Targeted Therapy , Prognosis , Protein Transport , Protein-Arginine N-Methyltransferases/genetics , Pyrimidines/pharmacology , Transcriptome , Triple Negative Breast Neoplasms/drug therapy , Triple Negative Breast Neoplasms/genetics , Triple Negative Breast Neoplasms/pathology , Xenograft Model Antitumor Assays
11.
Methods Mol Biol ; 1883: 143-160, 2019.
Article in English | MEDLINE | ID: mdl-30547399

ABSTRACT

This chapter addresses the problem of reconstructing regulatory networks in molecular biology by integrating multiple sources of data. We consider data sets measured from diverse technologies all related to the same set of variables and individuals. This situation is becoming more and more common in molecular biology, for instance, when both proteomic and transcriptomic data related to the same set of "genes" are available on a given cohort of patients.To infer a consensus network that integrates both proteomic and transcriptomic data, we introduce a multivariate extension of Gaussian graphical models (GGM), which we refer to as multiattribute GGM. Indeed, the GGM framework offers a good proxy for modeling direct links between biological entities. We perform the inference of our multivariate GGM with a neighborhood selection procedure that operates at a multiscale level. This procedure employs a group-Lasso penalty in order to select interactions which operate both at the proteomic and at the transcriptomic level between two genes. We end up with a consensus network embedding information shared at multiple scales of the cell. We illustrate this method on two breast cancer data sets. An R-package is publicly available on github at https://github.com/jchiquet/multivarNetwork to promote reproducibility.


Subject(s)
Breast Neoplasms/genetics , Computational Biology/methods , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Models, Genetic , Algorithms , Computational Biology/instrumentation , Datasets as Topic , Female , Gene Expression Profiling/instrumentation , Gene Expression Profiling/methods , Humans , Normal Distribution , Proteomics/instrumentation , Proteomics/methods , Software
12.
Cancer Med ; 8(1): 325-336, 2019 01.
Article in English | MEDLINE | ID: mdl-30575334

ABSTRACT

Triple-negative breast cancer (TNBC) is the breast cancer subtype with the worst prognosis. New treatments improving the survival of TNBC patients are, therefore, urgently required. We performed a transcriptome microarray analysis to identify new treatment targets for TNBC. We found that low-density lipoprotein receptor-related protein 8 (LRP8) was more strongly expressed in estrogen receptor-negative breast tumors, including TNBCs and those overexpressing HER2, than in luminal breast tumors and normal breast tissues. LRP8 depletion decreased cell proliferation more efficiently in estrogen receptor-negative breast cancer cell lines: TNBC and HER2 overexpressing cell lines. We next focused on TNBC cells for which targeted therapies are not available. LRP8 depletion induced an arrest of the cell cycle progression in G1 phase and programmed cell death. We also found that LRP8 is required for anchorage-independent growth in vitro, and that its depletion in vivo slowed tumor growth in a xenograft model. Our findings suggest that new approaches targeting LRP8 may constitute promising treatments for hormone-negative breast cancers, those overexpressing HER2 and TNBCs.


Subject(s)
LDL-Receptor Related Proteins/genetics , Triple Negative Breast Neoplasms/genetics , Animals , Apoptosis , Cell Cycle , Cell Line, Tumor , Cell Proliferation , Female , Humans , Mice, Nude , Triple Negative Breast Neoplasms/pathology
13.
Plant J ; 96(3): 635-650, 2018 11.
Article in English | MEDLINE | ID: mdl-30079488

ABSTRACT

Characterizing the natural diversity of gene expression across environments is an important step in understanding how genotype-by-environment interactions shape phenotypes. Here, we analyzed the impact of water deficit onto gene expression levels in tomato at the genome-wide scale. We sequenced the transcriptome of growing leaves and fruit pericarps at cell expansion stage in a cherry and a large fruited accession and their F1 hybrid grown under two watering regimes. Gene expression levels were steadily affected by the genotype and the watering regime. Whereas phenotypes showed mostly additive inheritance, ~80% of the genes displayed non-additive inheritance. By comparing allele-specific expression (ASE) in the F1 hybrid to the allelic expression in both parental lines, respectively, 3005 genes in leaf and 2857 genes in fruit deviated from 1:1 ratio independently of the watering regime. Among these genes, ~55% were controlled by cis factors, ~25% by trans factors and ~20% by a combination of both types of factors. A total of 328 genes in leaf and 113 in fruit exhibited significant ASE-by-watering regime interaction, among which ~80% presented trans-by-watering regime interaction, suggesting a response to water deficit mediated through a majority of trans-acting loci in tomato. We cross-validated the expression levels of 274 transcripts in fruit and leaves of 124 recombinant inbred lines (RILs) and identified 163 expression quantitative trait loci (eQTLs) mostly confirming the divergences identified by ASE. Combining phenotypic and expression data, we observed a complex network of variation between genes encoding enzymes involved in the sugar metabolism.


Subject(s)
Quantitative Trait Loci/genetics , Solanum lycopersicum/genetics , Transcriptome , Water/physiology , Alleles , Dehydration , Fruit/genetics , Fruit/physiology , Genotype , Solanum lycopersicum/physiology , Phenotype
14.
Methods Mol Biol ; 1829: 279-294, 2018.
Article in English | MEDLINE | ID: mdl-29987729

ABSTRACT

Sequencing of total RNA enables the study of the whole plant transcriptome resulting from the simultaneous expression of the three genomes of plant cells (located in the nucleus, mitochondrion and chloroplast). While commonly used for the quantification of the nuclear gene expression, this method remains complex and challenging when applied to organellar genomes and/or when used to quantify posttranscriptional RNA maturations. Here we propose a complete bioinformatical and statistical pipeline to fully characterize the differences in the chloroplast transcriptome between two conditions. Experimental design as well as bioinformatics and statistical analyses are described in order to quantify both gene expression and RNA posttranscriptional maturations, i.e., RNA splicing, editing, and processing, and identify statistically significant differences.


Subject(s)
Chloroplasts/genetics , Computational Biology , Gene Expression Regulation, Plant , Genes, Chloroplast , High-Throughput Nucleotide Sequencing , RNA Processing, Post-Transcriptional , Computational Biology/methods , Databases, Genetic , Plant Cells , RNA Editing , RNA Splicing , RNA, Plant , Software , Workflow
15.
Oncotarget ; 9(32): 22586-22604, 2018 Apr 27.
Article in English | MEDLINE | ID: mdl-29854300

ABSTRACT

Triple-negative breast cancers (TNBCs) account for a large proportion of breast cancer deaths, due to the high rate of recurrence from residual, resistant tumor cells. New treatments are needed, to bypass chemoresistance and improve survival. The WNT pathway, which is activated in TNBCs, has been identified as an attractive pathway for treatment targeting. We analyzed expression of the WNT coreceptors LRP5 and LRP6 in human breast cancer samples. As previously described, LRP6 was overexpressed in TNBCs. However, we also showed, for the first time, that LRP5 was overexpressed in TNBCs too. The knockdown of LRP5 or LRP6 decreased tumorigenesis in vitro and in vivo, identifying both receptors as potential treatment targets in TNBC. The apoptotic effect of LRP5 knockdown was more robust than that of LRP6 depletion. We analyzed and compared the transcriptomes of cells depleted of LRP5 or LRP6, to identify genes specifically deregulated by LRP5 potentially implicated in cell death. We identified serine/threonine kinase 40 (STK40) as one of two genes specifically downregulated soon after LRP5 depletion. STK40 was found to be overexpressed in TNBCs, relative to other breast cancer subtypes, and in various other tumor types. STK40 depletion decreased cell viability and colony formation, and induced the apoptosis of TNBC cells. In addition, STK40 knockdown impaired growth in an anchorage-independent manner in vitro and slowed tumor growth in vivo. These findings identify the largely uncharacterized putative protein kinase STK40 as a novel candidate treatment target for TNBC.

16.
Brief Bioinform ; 19(1): 65-76, 2018 01 01.
Article in English | MEDLINE | ID: mdl-27742662

ABSTRACT

Numerous statistical pipelines are now available for the differential analysis of gene expression measured with RNA-sequencing technology. Most of them are based on similar statistical frameworks after normalization, differing primarily in the choice of data distribution, mean and variance estimation strategy and data filtering. We propose an evaluation of the impact of these choices when few biological replicates are available through the use of synthetic data sets. This framework is based on real data sets and allows the exploration of various scenarios differing in the proportion of non-differentially expressed genes. Hence, it provides an evaluation of the key ingredients of the differential analysis, free of the biases associated with the simulation of data using parametric models. Our results show the relevance of a proper modeling of the mean by using linear or generalized linear modeling. Once the mean is properly modeled, the impact of the other parameters on the performance of the test is much less important. Finally, we propose to use the simple visualization of the raw P-value histogram as a practical evaluation criterion of the performance of differential analysis methods on real data sets.


Subject(s)
Arabidopsis Proteins/genetics , Gene Expression Profiling/methods , High-Throughput Nucleotide Sequencing/methods , RNA/genetics , Sequence Analysis, RNA/methods , Transcriptome , Arabidopsis/genetics , Computer Simulation , Datasets as Topic , Humans , Models, Statistical , Software
17.
New Phytol ; 217(1): 367-377, 2018 Jan.
Article in English | MEDLINE | ID: mdl-29034956

ABSTRACT

Structural variation is a major source of genetic diversity and an important substrate for selection. In allopolyploids, homoeologous exchanges (i.e. between the constituent subgenomes) are a very frequent type of structural variant. However, their direct impact on gene content and gene expression had not been determined. Here, we used a tissue-specific mRNA-Seq dataset to measure the consequences of homoeologous exchanges (HE) on gene expression in Brassica napus, a representative allotetraploid crop. We demonstrate that expression changes are proportional to the change in gene copy number triggered by the HEs. Thus, when homoeologous gene pairs have unbalanced transcriptional contributions before the HE, duplication of one copy does not accurately compensate for loss of the other and combined homoeologue expression also changes. These effects are, however, mitigated over time. This study sheds light on the origins, timing and functional consequences of homeologous exchanges in allopolyploids. It demonstrates that the interplay between new structural variation and the resulting impacts on gene expression, influences allopolyploid genome evolution.


Subject(s)
Brassica napus/genetics , Gene Dosage , Genetic Variation , Genome, Plant/genetics , Gene Expression , Organ Specificity , Polyploidy , Recombination, Genetic , Sequence Analysis, RNA
18.
Proc Natl Acad Sci U S A ; 114(33): 8877-8882, 2017 08 15.
Article in English | MEDLINE | ID: mdl-28760958

ABSTRACT

RNA editing is converting hundreds of cytosines into uridines during organelle gene expression of land plants. The pentatricopeptide repeat (PPR) proteins are at the core of this posttranscriptional RNA modification. Even if a PPR protein defines the editing site, a DYW domain of the same or another PPR protein is believed to catalyze the deamination. To give insight into the organelle RNA editosome, we performed tandem affinity purification of the plastidial CHLOROPLAST BIOGENESIS 19 (CLB19) PPR editing factor. Two PPR proteins, dually targeted to mitochondria and chloroplasts, were identified as potential partners of CLB19. These two proteins, a P-type PPR and a member of a small PPR-DYW subfamily, were shown to interact in yeast. Insertional mutations resulted in embryo lethality that could be rescued by embryo-specific complementation. A transcriptome analysis of these complemented plants showed major editing defects in both organelles with a very high PPR type specificity, indicating that the two proteins are core members of E+-type PPR editosomes.


Subject(s)
Arabidopsis Proteins/metabolism , Arabidopsis/metabolism , Chloroplasts/metabolism , Mitochondria/metabolism , RNA Editing/physiology , RNA-Binding Proteins/metabolism , Arabidopsis/genetics , Arabidopsis Proteins/genetics , Chloroplasts/genetics , Mitochondria/genetics , RNA-Binding Proteins/genetics
19.
PLoS Genet ; 13(3): e1006666, 2017 03.
Article in English | MEDLINE | ID: mdl-28301472

ABSTRACT

Through the local selection of landraces, humans have guided the adaptation of crops to a vast range of climatic and ecological conditions. This is particularly true of maize, which was domesticated in a restricted area of Mexico but now displays one of the broadest cultivated ranges worldwide. Here, we sequenced 67 genomes with an average sequencing depth of 18x to document routes of introduction, admixture and selective history of European maize and its American counterparts. To avoid the confounding effects of recent breeding, we targeted germplasm (lines) directly derived from landraces. Among our lines, we discovered 22,294,769 SNPs and between 0.9% to 4.1% residual heterozygosity. Using a segmentation method, we identified 6,978 segments of unexpectedly high rate of heterozygosity. These segments point to genes potentially involved in inbreeding depression, and to a lesser extent to the presence of structural variants. Genetic structuring and inferences of historical splits revealed 5 genetic groups and two independent European introductions, with modest bottleneck signatures. Our results further revealed admixtures between distinct sources that have contributed to the establishment of 3 groups at intermediate latitudes in North America and Europe. We combined differentiation- and diversity-based statistics to identify both genes and gene networks displaying strong signals of selection. These include genes/gene networks involved in flowering time, drought and cold tolerance, plant defense and starch properties. Overall, our results provide novel insights into the evolutionary history of European maize and highlight a major role of admixture in environmental adaptation, paralleling recent findings in humans.


Subject(s)
Adaptation, Physiological/genetics , Genes, Plant/genetics , Plant Breeding/methods , Zea mays/genetics , Europe , Genetic Variation , Genome, Plant/genetics , Geography , Heterozygote , High-Throughput Nucleotide Sequencing/methods , Humans , Models, Genetic , Phylogeny , Polymorphism, Single Nucleotide , Selection, Genetic , United States , Zea mays/classification
20.
Stat Comput ; 27(2): 519-533, 2017.
Article in English | MEDLINE | ID: mdl-32355427

ABSTRACT

Many common approaches to detecting changepoints, for example based on statistical criteria such as penalised likelihood or minimum description length, can be formulated in terms of minimising a cost over segmentations. We focus on a class of dynamic programming algorithms that can solve the resulting minimisation problem exactly, and thus find the optimal segmentation under the given statistical criteria. The standard implementation of these dynamic programming methods have a computational cost that scales at least quadratically in the length of the time-series. Recently pruning ideas have been suggested that can speed up the dynamic programming algorithms, whilst still being guaranteed to be optimal, in that they find the true minimum of the cost function. Here we extend these pruning methods, and introduce two new algorithms for segmenting data: FPOP and SNIP. Empirical results show that FPOP is substantially faster than existing dynamic programming methods, and unlike the existing methods its computational efficiency is robust to the number of changepoints in the data. We evaluate the method for detecting copy number variations and observe that FPOP has a computational cost that is even competitive with that of binary segmentation, but can give much more accurate segmentations.

SELECTION OF CITATIONS
SEARCH DETAIL
...