Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 85
Filter
Add more filters

Publication year range
1.
Nat Methods ; 20(1): 104-111, 2023 01.
Article in English | MEDLINE | ID: mdl-36522501

ABSTRACT

Protein sequence alignment is a key component of most bioinformatics pipelines to study the structures and functions of proteins. Aligning highly divergent sequences remains, however, a difficult task that current algorithms often fail to perform accurately, leaving many proteins or open reading frames poorly annotated. Here we leverage recent advances in deep learning for language modeling and differentiable programming to propose DEDAL (deep embedding and differentiable alignment), a flexible model to align protein sequences and detect homologs. DEDAL is a machine learning-based model that learns to align sequences by observing large datasets of raw protein sequences and of correct alignments. Once trained, we show that DEDAL improves by up to two- or threefold the alignment correctness over existing methods on remote homologs and better discriminates remote homologs from evolutionarily unrelated sequences, paving the way to improvements on many downstream tasks relying on sequence alignment in structural and functional genomics.


Subject(s)
Algorithms , Proteins , Amino Acid Sequence , Proteins/genetics , Proteins/chemistry , Sequence Alignment , Genomics
2.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36594573

ABSTRACT

MOTIVATION: We address the challenge of inferring a consensus 3D model of genome architecture from Hi-C data. Existing approaches most often rely on a two-step algorithm: first, convert the contact counts into distances, then optimize an objective function akin to multidimensional scaling (MDS) to infer a 3D model. Other approaches use a maximum likelihood approach, modeling the contact counts between two loci as a Poisson random variable whose intensity is a decreasing function of the distance between them. However, a Poisson model of contact counts implies that the variance of the data is equal to the mean, a relationship that is often too restrictive to properly model count data. RESULTS: We first confirm the presence of overdispersion in several real Hi-C datasets, and we show that the overdispersion arises even in simulated datasets. We then propose a new model, called Pastis-NB, where we replace the Poisson model of contact counts by a negative binomial one, which is parametrized by a mean and a separate dispersion parameter. The dispersion parameter allows the variance to be adjusted independently from the mean, thus better modeling overdispersed data. We compare the results of Pastis-NB to those of several previously published algorithms, both MDS-based and statistical methods. We show that the negative binomial inference yields more accurate structures on simulated data, and more robust structures than other models across real Hi-C replicates and across different resolutions. AVAILABILITY AND IMPLEMENTATION: A Python implementation of Pastis-NB is available at https://github.com/hiclib/pastis under the BSD license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Genome , Likelihood Functions
3.
Bioinformatics ; 39(7)2023 07 01.
Article in English | MEDLINE | ID: mdl-37421399

ABSTRACT

MOTIVATION: Modality matching in single-cell omics data analysis-i.e. matching cells across datasets collected using different types of genomic assays-has become an important problem, because unifying perspectives across different technologies holds the promise of yielding biological and clinical discoveries. However, single-cell dataset sizes can now reach hundreds of thousands to millions of cells, which remain out of reach for most multimodal computational methods. RESULTS: We propose LSMMD-MA, a large-scale Python implementation of the MMD-MA method for multimodal data integration. In LSMMD-MA, we reformulate the MMD-MA optimization problem using linear algebra and solve it with KeOps, a CUDA framework for symbolic matrix computation in Python. We show that LSMMD-MA scales to a million cells in each modality, two orders of magnitude greater than existing implementations. AVAILABILITY AND IMPLEMENTATION: LSMMD-MA is freely available at https://github.com/google-research/large_scale_mmdma and archived at https://doi.org/10.5281/zenodo.8076311.


Subject(s)
Genome , Genomics , Genomics/methods , Research Design , Data Analysis , Single-Cell Analysis , Software
4.
Nucleic Acids Res ; 48(5): 2303-2311, 2020 03 18.
Article in English | MEDLINE | ID: mdl-32034421

ABSTRACT

Chromatin conformation assays such as Hi-C cannot directly measure differences in 3D architecture between cell types or cell states. For this purpose, two or more Hi-C experiments must be carried out, but direct comparison of the resulting Hi-C matrices is confounded by several features of Hi-C data. Most notably, the genomic distance effect, whereby contacts between pairs of genomic loci that are proximal along the chromosome exhibit many more Hi-C contacts that distal pairs of loci, dominates every Hi-C matrix. Furthermore, the form that this distance effect takes often varies between different Hi-C experiments, even between replicate experiments. Thus, a statistical confidence measure designed to identify differential Hi-C contacts must accurately account for the genomic distance effect or risk being misled by large-scale but artifactual differences. ACCOST (Altered Chromatin COnformation STatistics) accomplishes this goal by extending the statistical model employed by DEseq, re-purposing the 'size factors,' which were originally developed to account for differences in read depth between samples, to instead model the genomic distance effect. We show via analysis of simulated and real data that ACCOST provides unbiased statistical confidence estimates that compare favorably with competing methods such as diffHiC, FIND and HiCcompare. ACCOST is freely available with an Apache license at https://bitbucket.org/noblelab/accost.


Subject(s)
Chromatin/chemistry , DNA/chemistry , Genetic Loci , Genome , Software , Animals , Cell Line , Chromatin/metabolism , DNA/metabolism , Epistasis, Genetic , Epithelial Cells/cytology , Epithelial Cells/metabolism , Humans , Lymphocytes/cytology , Lymphocytes/metabolism , Mice , Molecular Conformation , Plasmodium falciparum/genetics , Sporozoites/genetics , Trophozoites/genetics
5.
Bioinformatics ; 36(18): 4774-4780, 2020 09 15.
Article in English | MEDLINE | ID: mdl-33026066

ABSTRACT

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) offers new possibilities to infer gene regulatory network (GRNs) for biological processes involving a notion of time, such as cell differentiation or cell cycles. It also raises many challenges due to the destructive measurements inherent to the technology. RESULTS: In this work, we propose a new method named GRISLI for de novo GRN inference from scRNA-seq data. GRISLI infers a velocity vector field in the space of scRNA-seq data from profiles of individual cells, and models the dynamics of cell trajectories with a linear ordinary differential equation to reconstruct the underlying GRN with a sparse regression procedure. We show on real data that GRISLI outperforms a recently proposed state-of-the-art method for GRN reconstruction from scRNA-seq data. AVAILABILITY AND IMPLEMENTATION: The MATLAB code of GRISLI is available at: https://github.com/PCAubin/GRISLI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Gene Regulatory Networks , RNA-Seq , Sequence Analysis, RNA
6.
PLoS Biol ; 15(10): e2004045, 2017 Oct.
Article in English | MEDLINE | ID: mdl-29049289

ABSTRACT

During vertebrate neurulation, the embryonic ectoderm is patterned into lineage progenitors for neural plate, neural crest, placodes and epidermis. Here, we use Xenopus laevis embryos to analyze the spatial and temporal transcriptome of distinct ectodermal domains in the course of neurulation, during the establishment of cell lineages. In order to define the transcriptome of small groups of cells from a single germ layer and to retain spatial information, dorsal and ventral ectoderm was subdivided along the anterior-posterior and medial-lateral axes by microdissections. Principal component analysis on the transcriptomes of these ectoderm fragments primarily identifies embryonic axes and temporal dynamics. This provides a genetic code to define positional information of any ectoderm sample along the anterior-posterior and dorsal-ventral axes directly from its transcriptome. In parallel, we use nonnegative matrix factorization to predict enhanced gene expression maps onto early and mid-neurula embryos, and specific signatures for each ectoderm area. The clustering of spatial and temporal datasets allowed detection of multiple biologically relevant groups (e.g., Wnt signaling, neural crest development, sensory placode specification, ciliogenesis, germ layer specification). We provide an interactive network interface, EctoMap, for exploring synexpression relationships among genes expressed in the neurula, and suggest several strategies to use this comprehensive dataset to address questions in developmental biology as well as stem cell or cancer research.


Subject(s)
Ectoderm/embryology , Neural Crest/embryology , Neurons/cytology , Stem Cells/metabolism , Xenopus laevis/embryology , Algorithms , Animals , Cluster Analysis , Databases, Genetic , Ectoderm/metabolism , Gastrulation/genetics , Gene Expression Profiling , Gene Expression Regulation, Developmental , Gene Ontology , Gene Regulatory Networks , Humans , Internet , Microdissection , Neoplasms/genetics , Neural Crest/metabolism , Neurulation/genetics , Principal Component Analysis , Time Factors , Transcriptome/genetics , Wnt Proteins/metabolism , Xenopus laevis/genetics
7.
PLoS Comput Biol ; 15(9): e1007381, 2019 09.
Article in English | MEDLINE | ID: mdl-31568528

ABSTRACT

Cancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets or biomarkers. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types. In this paper, we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including information about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types. We empirically show that LOTUS outperforms five other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types.


Subject(s)
Algorithms , Computational Biology/methods , Machine Learning , Neoplasms/genetics , Oncogenes/genetics , Software , Humans , Models, Statistical
8.
BMC Bioinformatics ; 19(1): 313, 2018 Sep 06.
Article in English | MEDLINE | ID: mdl-30189838

ABSTRACT

BACKGROUND: Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact equally with each other. RESULTS: In order to explore the effect of copy-number variations on Hi-C data normalization, we first propose a simulation model that predict the effects of large copy-number changes on a diploid Hi-C contact map. We then show that the standard approaches relying on equal visibility fail to correct for unwanted effects in the presence of copy-number variations. We thus propose a simple extension to matrix balancing methods that model these effects. Our approach can either retain the copy-number variation effects (LOIC) or remove them (CAIC). We show that this leads to better downstream analysis of the three-dimensional organization of rearranged genomes. CONCLUSIONS: Taken together, our results highlight the importance of using dedicated methods for the analysis of Hi-C cancer data. Both CAIC and LOIC methods perform well on simulated and real Hi-C data sets, each fulfilling different needs.


Subject(s)
Chromosome Aberrations , Chromosome Mapping , Computational Biology/standards , DNA Copy Number Variations , Genome, Human , Genomics/methods , Neoplasms/genetics , Humans
9.
BMC Bioinformatics ; 19(Suppl 1): 39, 2018 02 19.
Article in English | MEDLINE | ID: mdl-29504897

ABSTRACT

BACKGROUND: Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several computational methods have been proposed to predict complexes from the topology and structure of experimental protein-protein interaction (PPI) network. These methods work well to predict complexes involving at least three proteins, but generally fail at identifying complexes involving only two different proteins, called heterodimeric complexes or heterodimers. There is however an urgent need for efficient methods to predict heterodimers, since the majority of known protein complexes are precisely heterodimers. RESULTS: In this paper, we use three promising kernel functions, Min kernel and two pairwise kernels, which are Metric Learning Pairwise Kernel (MLPK) and Tensor Product Pairwise Kernel (TPPK). We also consider the normalization forms of Min kernel. Then, we combine Min kernel or its normalization form and one of the pairwise kernels by plugging. We applied kernels based on PPI, domain, phylogenetic profile, and subcellular localization properties to predicting heterodimers. Then, we evaluate our method by employing C-Support Vector Classification (C-SVC), carrying out 10-fold cross-validation, and calculating the average F-measures. The results suggest that the combination of normalized-Min-kernel and MLPK leads to the best F-measure and improved the performance of our previous work, which had been the best existing method so far. CONCLUSIONS: We propose new methods to predict heterodimers, using a machine learning-based approach. We train a support vector machine (SVM) to discriminate interacting vs non-interacting protein pairs, based on informations extracted from PPI, domain, phylogenetic profiles and subcellular localization. We evaluate in detail new kernel functions to encode these data, and report prediction performance that outperforms the state-of-the-art.


Subject(s)
Algorithms , Multiprotein Complexes/chemistry , Dimerization , Multiprotein Complexes/classification , Phylogeny , Protein Domains , Protein Interaction Maps , Protein Multimerization , Support Vector Machine
10.
PLoS Comput Biol ; 13(6): e1005573, 2017 Jun.
Article in English | MEDLINE | ID: mdl-28650955

ABSTRACT

Genome-wide somatic mutation profiles of tumours can now be assessed efficiently and promise to move precision medicine forward. Statistical analysis of mutation profiles is however challenging due to the low frequency of most mutations, the varying mutation rates across tumours, and the presence of a majority of passenger events that hide the contribution of driver events. Here we propose a method, NetNorM, to represent whole-exome somatic mutation data in a form that enhances cancer-relevant information using a gene network as background knowledge. We evaluate its relevance for two tasks: survival prediction and unsupervised patient stratification. Using data from 8 cancer types from The Cancer Genome Atlas (TCGA), we show that it improves over the raw binary mutation data and network diffusion for these two tasks. In doing so, we also provide a thorough assessment of somatic mutations prognostic power which has been overlooked by previous studies because of the sparse and binary nature of mutations.


Subject(s)
Biomarkers, Tumor/genetics , Exome/genetics , Gene Regulatory Networks/genetics , Genome-Wide Association Study/methods , Neoplasms/genetics , Neoplasms/mortality , Polymorphism, Single Nucleotide/genetics , Algorithms , Carcinogenesis/genetics , Chromosome Mapping/methods , Genetic Markers/genetics , Genetic Predisposition to Disease/epidemiology , Genetic Predisposition to Disease/genetics , Genetic Testing/methods , Genome, Human/genetics , Humans , Mutation/genetics , Neoplasms/pathology , Prognosis , Risk Assessment/methods , Risk Factors , Software , Survival Analysis
11.
Genome Res ; 24(6): 974-88, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24671853

ABSTRACT

The development of the human malaria parasite Plasmodium falciparum is controlled by coordinated changes in gene expression throughout its complex life cycle, but the corresponding regulatory mechanisms are incompletely understood. To study the relationship between genome architecture and gene regulation in Plasmodium, we assayed the genome architecture of P. falciparum at three time points during its erythrocytic (asexual) cycle. Using chromosome conformation capture coupled with next-generation sequencing technology (Hi-C), we obtained high-resolution chromosomal contact maps, which we then used to construct a consensus three-dimensional genome structure for each time point. We observed strong clustering of centromeres, telomeres, ribosomal DNA, and virulence genes, resulting in a complex architecture that cannot be explained by a simple volume exclusion model. Internal virulence gene clusters exhibit domain-like structures in contact maps, suggesting that they play an important role in the genome architecture. Midway during the erythrocytic cycle, at the highly transcriptionally active trophozoite stage, the genome adopts a more open chromatin structure with increased chromosomal intermingling. In addition, we observed reduced expression of genes located in spatial proximity to the repressive subtelomeric center, and colocalization of distinct groups of parasite-specific genes with coordinated expression profiles. Overall, our results are indicative of a strong association between the P. falciparum spatial genome organization and gene expression. Understanding the molecular processes involved in genome conformation dynamics could contribute to the discovery of novel antimalarial strategies.


Subject(s)
Chromatin Assembly and Disassembly , Chromosomes/genetics , Genome, Protozoan , Models, Genetic , Plasmodium falciparum/genetics , Gene Expression Regulation, Developmental , Plasmodium falciparum/growth & development , Schizonts/metabolism , Trophozoites/metabolism
12.
Bioinformatics ; 32(7): 1023-32, 2016 04 01.
Article in English | MEDLINE | ID: mdl-26589281

ABSTRACT

MOTIVATION: Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Because of the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions. RESULTS: We propose a new rank-flexible machine learning-based compositional approach for taxonomic assignment of metagenomics reads and show that it benefits from increasing the number of fragments sampled from reference genome to tune its parameters, up to a coverage of about 10, and from increasing the k-mer size to about 12. Tuning the method involves training machine learning models on about 10(8) samples in 10(7) dimensions, which is out of reach of standard softwares but can be done efficiently with modern implementations for large-scale machine learning. The resulting method is competitive in terms of accuracy with well-established alignment and composition-based tools for problems involving a small to moderate number of candidate species and for reasonable amounts of sequencing errors. We show, however, that machine learning-based compositional approaches are still limited in their ability to deal with problems involving a greater number of species and more sensitive to sequencing errors. We finally show that the new method outperforms the state-of-the-art in its ability to classify reads from species of lineage absent from the reference database and confirm that compositional approaches achieve faster prediction times, with a gain of 2-17 times with respect to the BWA-MEM short read mapper, depending on the number of candidate species and the level of sequencing noise. AVAILABILITY AND IMPLEMENTATION: Data and codes are available at http://cbio.ensmp.fr/largescalemetagenomics CONTACT: pierre.mahe@biomerieux.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Machine Learning , Metagenomics , Sequence Analysis, DNA , Algorithms , Metagenome , Software
13.
PLoS Biol ; 12(6): e1001895, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24960609

ABSTRACT

The Wnt receptor Ryk is an evolutionary-conserved protein important during neuronal differentiation through several mechanisms, including γ-secretase cleavage and nuclear translocation of its intracellular domain (Ryk-ICD). Although the Wnt pathway may be neuroprotective, the role of Ryk in neurodegenerative disease remains unknown. We found that Ryk is up-regulated in neurons expressing mutant huntingtin (HTT) in several models of Huntington's disease (HD). Further investigation in Caenorhabditis elegans and mouse striatal cell models of HD provided a model in which the early-stage increase of Ryk promotes neuronal dysfunction by repressing the neuroprotective activity of the longevity-promoting factor FOXO through a noncanonical mechanism that implicates the Ryk-ICD fragment and its binding to the FOXO co-factor ß-catenin. The Ryk-ICD fragment suppressed neuroprotection by lin-18/Ryk loss-of-function in expanded-polyQ nematodes, repressed FOXO transcriptional activity, and abolished ß-catenin protection of mutant htt striatal cells against cell death vulnerability. Additionally, Ryk-ICD was increased in the nucleus of mutant htt cells, and reducing γ-secretase PS1 levels compensated for the cytotoxicity of full-length Ryk in these cells. These findings reveal that the Ryk-ICD pathway may impair FOXO protective activity in mutant polyglutamine neurons, suggesting that neurons are unable to efficiently maintain function and resist disease from the earliest phases of the pathogenic process in HD.


Subject(s)
Forkhead Transcription Factors/metabolism , Huntington Disease/etiology , Neurons/metabolism , Receptor Protein-Tyrosine Kinases/metabolism , Receptors, Wnt/metabolism , Aged , Animals , Caenorhabditis elegans , Caenorhabditis elegans Proteins/genetics , Caenorhabditis elegans Proteins/metabolism , Cell Line , Female , Humans , Huntington Disease/metabolism , Male , Mice , Mice, Transgenic , Middle Aged , Oligonucleotide Array Sequence Analysis , Presenilin-1/metabolism , Receptor Protein-Tyrosine Kinases/genetics , Serotonin Plasma Membrane Transport Proteins/genetics , Serotonin Plasma Membrane Transport Proteins/metabolism , Wnt Signaling Pathway
14.
Bioessays ; 37(2): 182-94, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25394267

ABSTRACT

Plasmodium falciparum is the most deadly human malarial parasite, responsible for an estimated 207 million cases of disease and 627,000 deaths in 2012. Recent studies reveal that the parasite actively regulates a large fraction of its genes throughout its replicative cycle inside human red blood cells and that epigenetics plays an important role in this precise gene regulation. Here, we discuss recent advances in our understanding of three aspects of epigenetic regulation in P. falciparum: changes in histone modifications, nucleosome occupancy and the three-dimensional genome structure. We compare these three aspects of the P. falciparum epigenome to those of other eukaryotes, and show that large-scale compartmentalization is particularly important in determining histone decomposition and gene regulation in P. falciparum. We conclude by presenting a gene regulation model for P. falciparum that combines the described epigenetic factors, and by discussing the implications of this model for the future of malaria research.


Subject(s)
Histones/metabolism , Nucleosomes/metabolism , Plasmodium falciparum/pathogenicity , Epigenesis, Genetic/genetics , Epigenesis, Genetic/physiology , Malaria/parasitology , Virulence
15.
Nucleic Acids Res ; 43(11): 5331-9, 2015 Jun 23.
Article in English | MEDLINE | ID: mdl-25940625

ABSTRACT

Centromeres are essential for proper chromosome segregation. Despite extensive research, centromere locations in yeast genomes remain difficult to infer, and in most species they are still unknown. Recently, the chromatin conformation capture assay, Hi-C, has been re-purposed for diverse applications, including de novo genome assembly, deconvolution of metagenomic samples and inference of centromere locations. We describe a method, Centurion, that jointly infers the locations of all centromeres in a single genome from Hi-C data by exploiting the centromeres' tendency to cluster in three-dimensional space. We first demonstrate the accuracy of Centurion in identifying known centromere locations from high coverage Hi-C data of budding yeast and a human malaria parasite. We then use Centurion to infer centromere locations in 14 yeast species. Across all microbes that we consider, Centurion predicts 89% of centromeres within 5 kb of their known locations. We also demonstrate the robustness of the approach in datasets with low sequencing depth. Finally, we predict centromere coordinates for six yeast species that currently lack centromere annotations. These results show that Centurion can be used for centromere identification for diverse species of yeast and possibly other microorganisms.


Subject(s)
Centromere , Genome, Fungal , Genomics/methods , Yeasts/genetics , Chromosome Mapping , DNA Restriction Enzymes , Metagenomics , Plasmodium falciparum/genetics , Saccharomyces cerevisiae/genetics , Software
16.
Bioinformatics ; 31(12): i320-8, 2015 Jun 15.
Article in English | MEDLINE | ID: mdl-26072499

ABSTRACT

MOTIVATION: Motility is a fundamental cellular attribute, which plays a major part in processes ranging from embryonic development to metastasis. Traditionally, single cell motility is often studied by live cell imaging. Yet, such studies were so far limited to low throughput. To systematically study cell motility at a large scale, we need robust methods to quantify cell trajectories in live cell imaging data. RESULTS: The primary contribution of this article is to present Motility study Integrated Workflow (MotIW), a generic workflow for the study of single cell motility in high-throughput time-lapse screening data. It is composed of cell tracking, cell trajectory mapping to an original feature space and hit detection according to a new statistical procedure. We show that this workflow is scalable and demonstrates its power by application to simulated data, as well as large-scale live cell imaging data. This application enables the identification of an ontology of cell motility patterns in a fully unsupervised manner. AVAILABILITY AND IMPLEMENTATION: Python code and examples are available online (http://cbio.ensmp.fr/∼aschoenauer/motiw.html)


Subject(s)
Cell Movement , Cell Tracking/methods , Time-Lapse Imaging/methods , HeLa Cells , Humans , Single-Cell Analysis , Software , Workflow
17.
Hum Genomics ; 9: 26, 2015 Oct 13.
Article in English | MEDLINE | ID: mdl-26463173

ABSTRACT

BACKGROUND: The CpG island methylator phenotype (CIMP) was first characterized in colorectal cancer but since has been extensively studied in several other tumor types such as breast, bladder, lung, and gastric. CIMP is of clinical importance as it has been reported to be associated with prognosis or response to treatment. However, the identification of a universal molecular basis to define CIMP across tumors has remained elusive. RESULTS: We perform a genome-wide methylation analysis of over 2000 tumor samples from 5 cancer sites to assess the existence of a CIMP with common molecular basis across cancers. We then show that the CIMP phenotype is associated with specific gene expression variations. However, we do not find a common genetic signature in all tissues associated with CIMP. CONCLUSION: Our results suggest the existence of a universal epigenetic and transcriptomic signature that defines the CIMP across several tumor types but does not indicate the existence of a common genetic signature of CIMP.


Subject(s)
DNA Methylation/genetics , Gene Expression Regulation, Neoplastic , Neoplasm Proteins/biosynthesis , Neoplasms/genetics , Biomarkers, Tumor , CpG Islands/genetics , Databases, Genetic , Genome, Human , Humans , Mutation , Neoplasm Metastasis , Neoplasm Proteins/genetics , Neoplasms/pathology , Prognosis
18.
BMC Bioinformatics ; 16: 262, 2015 Aug 19.
Article in English | MEDLINE | ID: mdl-26286719

ABSTRACT

BACKGROUND: Detecting and quantifying isoforms from RNA-seq data is an important but challenging task. The problem is often ill-posed, particularly at low coverage. One promising direction is to exploit several samples simultaneously. RESULTS: We propose a new method for solving the isoform deconvolution problem jointly across several samples. We formulate a convex optimization problem that allows to share information between samples and that we solve efficiently. We demonstrate the benefits of combining several samples on simulated and real data, and show that our approach outperforms pooling strategies and methods based on integer programming. CONCLUSION: Our convex formulation to jointly detect and quantify isoforms from RNA-seq data of multiple related samples is a computationally efficient approach to leverage the hypotheses that some isoforms are likely to be present in several samples. The software and source code are available at http://cbio.ensmp.fr/flipflop.


Subject(s)
RNA Isoforms/analysis , RNA/metabolism , Algorithms , Alternative Splicing , Humans , Internet , RNA Isoforms/metabolism , Sequence Analysis, RNA , Transcriptome , User-Computer Interface
19.
Dev Biol ; 386(2): 461-72, 2014 Feb 15.
Article in English | MEDLINE | ID: mdl-24360906

ABSTRACT

Neural crest development is orchestrated by a complex and still poorly understood gene regulatory network. Premigratory neural crest is induced at the lateral border of the neural plate by the combined action of signaling molecules and transcription factors such as AP2, Gbx2, Pax3 and Zic1. Among them, Pax3 and Zic1 are both necessary and sufficient to trigger a complete neural crest developmental program. However, their gene targets in the neural crest regulatory network remain unknown. Here, through a transcriptome analysis of frog microdissected neural border, we identified an extended gene signature for the premigratory neural crest, and we defined novel potential members of the regulatory network. This signature includes 34 novel genes, as well as 44 known genes expressed at the neural border. Using another microarray analysis which combined Pax3 and Zic1 gain-of-function and protein translation blockade, we uncovered 25 Pax3 and Zic1 direct targets within this signature. We demonstrated that the neural border specifiers Pax3 and Zic1 are direct upstream regulators of neural crest specifiers Snail1/2, Foxd3, Twist1, and Tfap2b. In addition, they may modulate the transcriptional output of multiple signaling pathways involved in neural crest development (Wnt, Retinoic Acid) through the induction of key pathway regulators (Axin2 and Cyp26c1). We also found that Pax3 could maintain its own expression through a positive autoregulatory feedback loop. These hierarchical inductions, feedback loops, and pathway modulations provide novel tools to understand the neural crest induction network.


Subject(s)
Gene Expression Regulation, Developmental/genetics , Gene Regulatory Networks/genetics , Neural Crest/embryology , Paired Box Transcription Factors/metabolism , Transcription Factors/metabolism , Xenopus Proteins/metabolism , Xenopus laevis/embryology , Animals , Electrophoretic Mobility Shift Assay , Gene Expression Regulation, Developmental/physiology , Gene Regulatory Networks/physiology , In Situ Hybridization , Microarray Analysis , PAX3 Transcription Factor , Real-Time Polymerase Chain Reaction , Reverse Transcriptase Polymerase Chain Reaction , Xenopus laevis/genetics
20.
BMC Genomics ; 16: 873, 2015 Oct 28.
Article in English | MEDLINE | ID: mdl-26510534

ABSTRACT

BACKGROUND: Methylation of high-density CpG regions known as CpG Islands (CGIs) has been widely described as a mechanism associated with gene expression regulation. Aberrant promoter methylation is considered a hallmark of cancer involved in silencing of tumor suppressor genes and activation of oncogenes. However, recent studies have also challenged the simple model of gene expression control by promoter methylation in cancer, and the precise mechanism of and role played by changes in DNA methylation in carcinogenesis remains elusive. RESULTS: Using a large dataset of 672 matched cancerous and healthy methylomes, gene expression, and copy number profiles accross 3 types of tissues from The Cancer Genome Atlas (TCGA), we perform a detailed meta-analysis to clarify the interplay between promoter methylation and gene expression in normal and cancer samples. On the one hand, we recover the existence of a CpG island methylator phenotype (CIMP) with prognostic value in a subset of breast, colon and lung cancer samples, where a common subset of promoter CGIs hypomethylated in normal samples become hypermethylated. However, this hypermethylation is not accompanied by a decrease in expression of the corresponding genes, which are already lowly expressed in the normal genes. On the other hand, we identify tissue-specific sets of genes, different between normal and cancer samples, whose inter-individual variation in expression is significantly correlated with the variation in methylation of the 3' flanking regions of the promoter CGIs. These subsets of genes are not the same in the different tissues, nor between normal and cancerous samples, but transcription factors are over-represented in all subsets. CONCLUSION: Our results suggest that epigenetic reprogramming in cancer does not contribute to cancer development via direct inhibition of gene expression through promoter hypermethylation. It may instead modify how the expression of a few specific genes, particularly transcription factors, are associated with DNA methylation variations in a tissue-dependent manner.


Subject(s)
DNA Methylation/genetics , Gene Expression Regulation, Neoplastic , Neoplasms/genetics , Promoter Regions, Genetic/genetics , Humans
SELECTION OF CITATIONS
SEARCH DETAIL