Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 85
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nat Methods ; 20(1): 104-111, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36522501

RESUMEN

Protein sequence alignment is a key component of most bioinformatics pipelines to study the structures and functions of proteins. Aligning highly divergent sequences remains, however, a difficult task that current algorithms often fail to perform accurately, leaving many proteins or open reading frames poorly annotated. Here we leverage recent advances in deep learning for language modeling and differentiable programming to propose DEDAL (deep embedding and differentiable alignment), a flexible model to align protein sequences and detect homologs. DEDAL is a machine learning-based model that learns to align sequences by observing large datasets of raw protein sequences and of correct alignments. Once trained, we show that DEDAL improves by up to two- or threefold the alignment correctness over existing methods on remote homologs and better discriminates remote homologs from evolutionarily unrelated sequences, paving the way to improvements on many downstream tasks relying on sequence alignment in structural and functional genomics.


Asunto(s)
Algoritmos , Proteínas , Secuencia de Aminoácidos , Proteínas/genética , Proteínas/química , Alineación de Secuencia , Genómica
2.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36594573

RESUMEN

MOTIVATION: We address the challenge of inferring a consensus 3D model of genome architecture from Hi-C data. Existing approaches most often rely on a two-step algorithm: first, convert the contact counts into distances, then optimize an objective function akin to multidimensional scaling (MDS) to infer a 3D model. Other approaches use a maximum likelihood approach, modeling the contact counts between two loci as a Poisson random variable whose intensity is a decreasing function of the distance between them. However, a Poisson model of contact counts implies that the variance of the data is equal to the mean, a relationship that is often too restrictive to properly model count data. RESULTS: We first confirm the presence of overdispersion in several real Hi-C datasets, and we show that the overdispersion arises even in simulated datasets. We then propose a new model, called Pastis-NB, where we replace the Poisson model of contact counts by a negative binomial one, which is parametrized by a mean and a separate dispersion parameter. The dispersion parameter allows the variance to be adjusted independently from the mean, thus better modeling overdispersed data. We compare the results of Pastis-NB to those of several previously published algorithms, both MDS-based and statistical methods. We show that the negative binomial inference yields more accurate structures on simulated data, and more robust structures than other models across real Hi-C replicates and across different resolutions. AVAILABILITY AND IMPLEMENTATION: A Python implementation of Pastis-NB is available at https://github.com/hiclib/pastis under the BSD license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Genoma , Funciones de Verosimilitud
3.
Bioinformatics ; 39(7)2023 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-37421399

RESUMEN

MOTIVATION: Modality matching in single-cell omics data analysis-i.e. matching cells across datasets collected using different types of genomic assays-has become an important problem, because unifying perspectives across different technologies holds the promise of yielding biological and clinical discoveries. However, single-cell dataset sizes can now reach hundreds of thousands to millions of cells, which remain out of reach for most multimodal computational methods. RESULTS: We propose LSMMD-MA, a large-scale Python implementation of the MMD-MA method for multimodal data integration. In LSMMD-MA, we reformulate the MMD-MA optimization problem using linear algebra and solve it with KeOps, a CUDA framework for symbolic matrix computation in Python. We show that LSMMD-MA scales to a million cells in each modality, two orders of magnitude greater than existing implementations. AVAILABILITY AND IMPLEMENTATION: LSMMD-MA is freely available at https://github.com/google-research/large_scale_mmdma and archived at https://doi.org/10.5281/zenodo.8076311.


Asunto(s)
Genoma , Genómica , Genómica/métodos , Proyectos de Investigación , Análisis de Datos , Análisis de la Célula Individual , Programas Informáticos
4.
Nucleic Acids Res ; 48(5): 2303-2311, 2020 03 18.
Artículo en Inglés | MEDLINE | ID: mdl-32034421

RESUMEN

Chromatin conformation assays such as Hi-C cannot directly measure differences in 3D architecture between cell types or cell states. For this purpose, two or more Hi-C experiments must be carried out, but direct comparison of the resulting Hi-C matrices is confounded by several features of Hi-C data. Most notably, the genomic distance effect, whereby contacts between pairs of genomic loci that are proximal along the chromosome exhibit many more Hi-C contacts that distal pairs of loci, dominates every Hi-C matrix. Furthermore, the form that this distance effect takes often varies between different Hi-C experiments, even between replicate experiments. Thus, a statistical confidence measure designed to identify differential Hi-C contacts must accurately account for the genomic distance effect or risk being misled by large-scale but artifactual differences. ACCOST (Altered Chromatin COnformation STatistics) accomplishes this goal by extending the statistical model employed by DEseq, re-purposing the 'size factors,' which were originally developed to account for differences in read depth between samples, to instead model the genomic distance effect. We show via analysis of simulated and real data that ACCOST provides unbiased statistical confidence estimates that compare favorably with competing methods such as diffHiC, FIND and HiCcompare. ACCOST is freely available with an Apache license at https://bitbucket.org/noblelab/accost.


Asunto(s)
Cromatina/química , ADN/química , Sitios Genéticos , Genoma , Programas Informáticos , Animales , Línea Celular , Cromatina/metabolismo , ADN/metabolismo , Epistasis Genética , Células Epiteliales/citología , Células Epiteliales/metabolismo , Humanos , Linfocitos/citología , Linfocitos/metabolismo , Ratones , Conformación Molecular , Plasmodium falciparum/genética , Esporozoítos/genética , Trofozoítos/genética
5.
Bioinformatics ; 36(18): 4774-4780, 2020 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-33026066

RESUMEN

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) offers new possibilities to infer gene regulatory network (GRNs) for biological processes involving a notion of time, such as cell differentiation or cell cycles. It also raises many challenges due to the destructive measurements inherent to the technology. RESULTS: In this work, we propose a new method named GRISLI for de novo GRN inference from scRNA-seq data. GRISLI infers a velocity vector field in the space of scRNA-seq data from profiles of individual cells, and models the dynamics of cell trajectories with a linear ordinary differential equation to reconstruct the underlying GRN with a sparse regression procedure. We show on real data that GRISLI outperforms a recently proposed state-of-the-art method for GRN reconstruction from scRNA-seq data. AVAILABILITY AND IMPLEMENTATION: The MATLAB code of GRISLI is available at: https://github.com/PCAubin/GRISLI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Redes Reguladoras de Genes , RNA-Seq , Análisis de Secuencia de ARN
6.
PLoS Biol ; 15(10): e2004045, 2017 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-29049289

RESUMEN

During vertebrate neurulation, the embryonic ectoderm is patterned into lineage progenitors for neural plate, neural crest, placodes and epidermis. Here, we use Xenopus laevis embryos to analyze the spatial and temporal transcriptome of distinct ectodermal domains in the course of neurulation, during the establishment of cell lineages. In order to define the transcriptome of small groups of cells from a single germ layer and to retain spatial information, dorsal and ventral ectoderm was subdivided along the anterior-posterior and medial-lateral axes by microdissections. Principal component analysis on the transcriptomes of these ectoderm fragments primarily identifies embryonic axes and temporal dynamics. This provides a genetic code to define positional information of any ectoderm sample along the anterior-posterior and dorsal-ventral axes directly from its transcriptome. In parallel, we use nonnegative matrix factorization to predict enhanced gene expression maps onto early and mid-neurula embryos, and specific signatures for each ectoderm area. The clustering of spatial and temporal datasets allowed detection of multiple biologically relevant groups (e.g., Wnt signaling, neural crest development, sensory placode specification, ciliogenesis, germ layer specification). We provide an interactive network interface, EctoMap, for exploring synexpression relationships among genes expressed in the neurula, and suggest several strategies to use this comprehensive dataset to address questions in developmental biology as well as stem cell or cancer research.


Asunto(s)
Ectodermo/embriología , Cresta Neural/embriología , Neuronas/citología , Células Madre/metabolismo , Xenopus laevis/embriología , Algoritmos , Animales , Análisis por Conglomerados , Bases de Datos Genéticas , Ectodermo/metabolismo , Gastrulación/genética , Perfilación de la Expresión Génica , Regulación del Desarrollo de la Expresión Génica , Ontología de Genes , Redes Reguladoras de Genes , Humanos , Internet , Microdisección , Neoplasias/genética , Cresta Neural/metabolismo , Neurulación/genética , Análisis de Componente Principal , Factores de Tiempo , Transcriptoma/genética , Proteínas Wnt/metabolismo , Xenopus laevis/genética
7.
PLoS Comput Biol ; 15(9): e1007381, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31568528

RESUMEN

Cancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets or biomarkers. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types. In this paper, we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including information about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types. We empirically show that LOTUS outperforms five other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Aprendizaje Automático , Neoplasias/genética , Oncogenes/genética , Programas Informáticos , Humanos , Modelos Estadísticos
8.
BMC Bioinformatics ; 19(Suppl 1): 39, 2018 02 19.
Artículo en Inglés | MEDLINE | ID: mdl-29504897

RESUMEN

BACKGROUND: Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several computational methods have been proposed to predict complexes from the topology and structure of experimental protein-protein interaction (PPI) network. These methods work well to predict complexes involving at least three proteins, but generally fail at identifying complexes involving only two different proteins, called heterodimeric complexes or heterodimers. There is however an urgent need for efficient methods to predict heterodimers, since the majority of known protein complexes are precisely heterodimers. RESULTS: In this paper, we use three promising kernel functions, Min kernel and two pairwise kernels, which are Metric Learning Pairwise Kernel (MLPK) and Tensor Product Pairwise Kernel (TPPK). We also consider the normalization forms of Min kernel. Then, we combine Min kernel or its normalization form and one of the pairwise kernels by plugging. We applied kernels based on PPI, domain, phylogenetic profile, and subcellular localization properties to predicting heterodimers. Then, we evaluate our method by employing C-Support Vector Classification (C-SVC), carrying out 10-fold cross-validation, and calculating the average F-measures. The results suggest that the combination of normalized-Min-kernel and MLPK leads to the best F-measure and improved the performance of our previous work, which had been the best existing method so far. CONCLUSIONS: We propose new methods to predict heterodimers, using a machine learning-based approach. We train a support vector machine (SVM) to discriminate interacting vs non-interacting protein pairs, based on informations extracted from PPI, domain, phylogenetic profiles and subcellular localization. We evaluate in detail new kernel functions to encode these data, and report prediction performance that outperforms the state-of-the-art.


Asunto(s)
Algoritmos , Complejos Multiproteicos/química , Dimerización , Complejos Multiproteicos/clasificación , Filogenia , Dominios Proteicos , Mapas de Interacción de Proteínas , Multimerización de Proteína , Máquina de Vectores de Soporte
9.
BMC Bioinformatics ; 19(1): 313, 2018 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-30189838

RESUMEN

BACKGROUND: Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact equally with each other. RESULTS: In order to explore the effect of copy-number variations on Hi-C data normalization, we first propose a simulation model that predict the effects of large copy-number changes on a diploid Hi-C contact map. We then show that the standard approaches relying on equal visibility fail to correct for unwanted effects in the presence of copy-number variations. We thus propose a simple extension to matrix balancing methods that model these effects. Our approach can either retain the copy-number variation effects (LOIC) or remove them (CAIC). We show that this leads to better downstream analysis of the three-dimensional organization of rearranged genomes. CONCLUSIONS: Taken together, our results highlight the importance of using dedicated methods for the analysis of Hi-C cancer data. Both CAIC and LOIC methods perform well on simulated and real Hi-C data sets, each fulfilling different needs.


Asunto(s)
Aberraciones Cromosómicas , Mapeo Cromosómico , Biología Computacional/normas , Variaciones en el Número de Copia de ADN , Genoma Humano , Genómica/métodos , Neoplasias/genética , Humanos
10.
PLoS Comput Biol ; 13(6): e1005573, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28650955

RESUMEN

Genome-wide somatic mutation profiles of tumours can now be assessed efficiently and promise to move precision medicine forward. Statistical analysis of mutation profiles is however challenging due to the low frequency of most mutations, the varying mutation rates across tumours, and the presence of a majority of passenger events that hide the contribution of driver events. Here we propose a method, NetNorM, to represent whole-exome somatic mutation data in a form that enhances cancer-relevant information using a gene network as background knowledge. We evaluate its relevance for two tasks: survival prediction and unsupervised patient stratification. Using data from 8 cancer types from The Cancer Genome Atlas (TCGA), we show that it improves over the raw binary mutation data and network diffusion for these two tasks. In doing so, we also provide a thorough assessment of somatic mutations prognostic power which has been overlooked by previous studies because of the sparse and binary nature of mutations.


Asunto(s)
Biomarcadores de Tumor/genética , Exoma/genética , Redes Reguladoras de Genes/genética , Estudio de Asociación del Genoma Completo/métodos , Neoplasias/genética , Neoplasias/mortalidad , Polimorfismo de Nucleótido Simple/genética , Algoritmos , Carcinogénesis/genética , Mapeo Cromosómico/métodos , Marcadores Genéticos/genética , Predisposición Genética a la Enfermedad/epidemiología , Predisposición Genética a la Enfermedad/genética , Pruebas Genéticas/métodos , Genoma Humano/genética , Humanos , Mutación/genética , Neoplasias/patología , Pronóstico , Medición de Riesgo/métodos , Factores de Riesgo , Programas Informáticos , Análisis de Supervivencia
11.
Genome Res ; 24(6): 974-88, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24671853

RESUMEN

The development of the human malaria parasite Plasmodium falciparum is controlled by coordinated changes in gene expression throughout its complex life cycle, but the corresponding regulatory mechanisms are incompletely understood. To study the relationship between genome architecture and gene regulation in Plasmodium, we assayed the genome architecture of P. falciparum at three time points during its erythrocytic (asexual) cycle. Using chromosome conformation capture coupled with next-generation sequencing technology (Hi-C), we obtained high-resolution chromosomal contact maps, which we then used to construct a consensus three-dimensional genome structure for each time point. We observed strong clustering of centromeres, telomeres, ribosomal DNA, and virulence genes, resulting in a complex architecture that cannot be explained by a simple volume exclusion model. Internal virulence gene clusters exhibit domain-like structures in contact maps, suggesting that they play an important role in the genome architecture. Midway during the erythrocytic cycle, at the highly transcriptionally active trophozoite stage, the genome adopts a more open chromatin structure with increased chromosomal intermingling. In addition, we observed reduced expression of genes located in spatial proximity to the repressive subtelomeric center, and colocalization of distinct groups of parasite-specific genes with coordinated expression profiles. Overall, our results are indicative of a strong association between the P. falciparum spatial genome organization and gene expression. Understanding the molecular processes involved in genome conformation dynamics could contribute to the discovery of novel antimalarial strategies.


Asunto(s)
Ensamble y Desensamble de Cromatina , Cromosomas/genética , Genoma de Protozoos , Modelos Genéticos , Plasmodium falciparum/genética , Regulación del Desarrollo de la Expresión Génica , Plasmodium falciparum/crecimiento & desarrollo , Esquizontes/metabolismo , Trofozoítos/metabolismo
12.
Bioinformatics ; 32(7): 1023-32, 2016 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-26589281

RESUMEN

MOTIVATION: Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Because of the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions. RESULTS: We propose a new rank-flexible machine learning-based compositional approach for taxonomic assignment of metagenomics reads and show that it benefits from increasing the number of fragments sampled from reference genome to tune its parameters, up to a coverage of about 10, and from increasing the k-mer size to about 12. Tuning the method involves training machine learning models on about 10(8) samples in 10(7) dimensions, which is out of reach of standard softwares but can be done efficiently with modern implementations for large-scale machine learning. The resulting method is competitive in terms of accuracy with well-established alignment and composition-based tools for problems involving a small to moderate number of candidate species and for reasonable amounts of sequencing errors. We show, however, that machine learning-based compositional approaches are still limited in their ability to deal with problems involving a greater number of species and more sensitive to sequencing errors. We finally show that the new method outperforms the state-of-the-art in its ability to classify reads from species of lineage absent from the reference database and confirm that compositional approaches achieve faster prediction times, with a gain of 2-17 times with respect to the BWA-MEM short read mapper, depending on the number of candidate species and the level of sequencing noise. AVAILABILITY AND IMPLEMENTATION: Data and codes are available at http://cbio.ensmp.fr/largescalemetagenomics CONTACT: pierre.mahe@biomerieux.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Metagenómica , Análisis de Secuencia de ADN , Algoritmos , Metagenoma , Programas Informáticos
13.
PLoS Biol ; 12(6): e1001895, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24960609

RESUMEN

The Wnt receptor Ryk is an evolutionary-conserved protein important during neuronal differentiation through several mechanisms, including γ-secretase cleavage and nuclear translocation of its intracellular domain (Ryk-ICD). Although the Wnt pathway may be neuroprotective, the role of Ryk in neurodegenerative disease remains unknown. We found that Ryk is up-regulated in neurons expressing mutant huntingtin (HTT) in several models of Huntington's disease (HD). Further investigation in Caenorhabditis elegans and mouse striatal cell models of HD provided a model in which the early-stage increase of Ryk promotes neuronal dysfunction by repressing the neuroprotective activity of the longevity-promoting factor FOXO through a noncanonical mechanism that implicates the Ryk-ICD fragment and its binding to the FOXO co-factor ß-catenin. The Ryk-ICD fragment suppressed neuroprotection by lin-18/Ryk loss-of-function in expanded-polyQ nematodes, repressed FOXO transcriptional activity, and abolished ß-catenin protection of mutant htt striatal cells against cell death vulnerability. Additionally, Ryk-ICD was increased in the nucleus of mutant htt cells, and reducing γ-secretase PS1 levels compensated for the cytotoxicity of full-length Ryk in these cells. These findings reveal that the Ryk-ICD pathway may impair FOXO protective activity in mutant polyglutamine neurons, suggesting that neurons are unable to efficiently maintain function and resist disease from the earliest phases of the pathogenic process in HD.


Asunto(s)
Factores de Transcripción Forkhead/metabolismo , Enfermedad de Huntington/etiología , Neuronas/metabolismo , Proteínas Tirosina Quinasas Receptoras/metabolismo , Receptores Wnt/metabolismo , Anciano , Animales , Caenorhabditis elegans , Proteínas de Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/metabolismo , Línea Celular , Femenino , Humanos , Enfermedad de Huntington/metabolismo , Masculino , Ratones , Ratones Transgénicos , Persona de Mediana Edad , Análisis de Secuencia por Matrices de Oligonucleótidos , Presenilina-1/metabolismo , Proteínas Tirosina Quinasas Receptoras/genética , Proteínas de Transporte de Serotonina en la Membrana Plasmática/genética , Proteínas de Transporte de Serotonina en la Membrana Plasmática/metabolismo , Vía de Señalización Wnt
14.
Bioessays ; 37(2): 182-94, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25394267

RESUMEN

Plasmodium falciparum is the most deadly human malarial parasite, responsible for an estimated 207 million cases of disease and 627,000 deaths in 2012. Recent studies reveal that the parasite actively regulates a large fraction of its genes throughout its replicative cycle inside human red blood cells and that epigenetics plays an important role in this precise gene regulation. Here, we discuss recent advances in our understanding of three aspects of epigenetic regulation in P. falciparum: changes in histone modifications, nucleosome occupancy and the three-dimensional genome structure. We compare these three aspects of the P. falciparum epigenome to those of other eukaryotes, and show that large-scale compartmentalization is particularly important in determining histone decomposition and gene regulation in P. falciparum. We conclude by presenting a gene regulation model for P. falciparum that combines the described epigenetic factors, and by discussing the implications of this model for the future of malaria research.


Asunto(s)
Histonas/metabolismo , Nucleosomas/metabolismo , Plasmodium falciparum/patogenicidad , Epigénesis Genética/genética , Epigénesis Genética/fisiología , Malaria/parasitología , Virulencia
15.
Nucleic Acids Res ; 43(11): 5331-9, 2015 Jun 23.
Artículo en Inglés | MEDLINE | ID: mdl-25940625

RESUMEN

Centromeres are essential for proper chromosome segregation. Despite extensive research, centromere locations in yeast genomes remain difficult to infer, and in most species they are still unknown. Recently, the chromatin conformation capture assay, Hi-C, has been re-purposed for diverse applications, including de novo genome assembly, deconvolution of metagenomic samples and inference of centromere locations. We describe a method, Centurion, that jointly infers the locations of all centromeres in a single genome from Hi-C data by exploiting the centromeres' tendency to cluster in three-dimensional space. We first demonstrate the accuracy of Centurion in identifying known centromere locations from high coverage Hi-C data of budding yeast and a human malaria parasite. We then use Centurion to infer centromere locations in 14 yeast species. Across all microbes that we consider, Centurion predicts 89% of centromeres within 5 kb of their known locations. We also demonstrate the robustness of the approach in datasets with low sequencing depth. Finally, we predict centromere coordinates for six yeast species that currently lack centromere annotations. These results show that Centurion can be used for centromere identification for diverse species of yeast and possibly other microorganisms.


Asunto(s)
Centrómero , Genoma Fúngico , Genómica/métodos , Levaduras/genética , Mapeo Cromosómico , Enzimas de Restricción del ADN , Metagenómica , Plasmodium falciparum/genética , Saccharomyces cerevisiae/genética , Programas Informáticos
16.
Bioinformatics ; 31(12): i320-8, 2015 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-26072499

RESUMEN

MOTIVATION: Motility is a fundamental cellular attribute, which plays a major part in processes ranging from embryonic development to metastasis. Traditionally, single cell motility is often studied by live cell imaging. Yet, such studies were so far limited to low throughput. To systematically study cell motility at a large scale, we need robust methods to quantify cell trajectories in live cell imaging data. RESULTS: The primary contribution of this article is to present Motility study Integrated Workflow (MotIW), a generic workflow for the study of single cell motility in high-throughput time-lapse screening data. It is composed of cell tracking, cell trajectory mapping to an original feature space and hit detection according to a new statistical procedure. We show that this workflow is scalable and demonstrates its power by application to simulated data, as well as large-scale live cell imaging data. This application enables the identification of an ontology of cell motility patterns in a fully unsupervised manner. AVAILABILITY AND IMPLEMENTATION: Python code and examples are available online (http://cbio.ensmp.fr/∼aschoenauer/motiw.html)


Asunto(s)
Movimiento Celular , Rastreo Celular/métodos , Imagen de Lapso de Tiempo/métodos , Células HeLa , Humanos , Análisis de la Célula Individual , Programas Informáticos , Flujo de Trabajo
17.
Hum Genomics ; 9: 26, 2015 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-26463173

RESUMEN

BACKGROUND: The CpG island methylator phenotype (CIMP) was first characterized in colorectal cancer but since has been extensively studied in several other tumor types such as breast, bladder, lung, and gastric. CIMP is of clinical importance as it has been reported to be associated with prognosis or response to treatment. However, the identification of a universal molecular basis to define CIMP across tumors has remained elusive. RESULTS: We perform a genome-wide methylation analysis of over 2000 tumor samples from 5 cancer sites to assess the existence of a CIMP with common molecular basis across cancers. We then show that the CIMP phenotype is associated with specific gene expression variations. However, we do not find a common genetic signature in all tissues associated with CIMP. CONCLUSION: Our results suggest the existence of a universal epigenetic and transcriptomic signature that defines the CIMP across several tumor types but does not indicate the existence of a common genetic signature of CIMP.


Asunto(s)
Metilación de ADN/genética , Regulación Neoplásica de la Expresión Génica , Proteínas de Neoplasias/biosíntesis , Neoplasias/genética , Biomarcadores de Tumor , Islas de CpG/genética , Bases de Datos Genéticas , Genoma Humano , Humanos , Mutación , Metástasis de la Neoplasia , Proteínas de Neoplasias/genética , Neoplasias/patología , Pronóstico
18.
BMC Bioinformatics ; 16: 262, 2015 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-26286719

RESUMEN

BACKGROUND: Detecting and quantifying isoforms from RNA-seq data is an important but challenging task. The problem is often ill-posed, particularly at low coverage. One promising direction is to exploit several samples simultaneously. RESULTS: We propose a new method for solving the isoform deconvolution problem jointly across several samples. We formulate a convex optimization problem that allows to share information between samples and that we solve efficiently. We demonstrate the benefits of combining several samples on simulated and real data, and show that our approach outperforms pooling strategies and methods based on integer programming. CONCLUSION: Our convex formulation to jointly detect and quantify isoforms from RNA-seq data of multiple related samples is a computationally efficient approach to leverage the hypotheses that some isoforms are likely to be present in several samples. The software and source code are available at http://cbio.ensmp.fr/flipflop.


Asunto(s)
Isoformas de ARN/análisis , ARN/metabolismo , Algoritmos , Empalme Alternativo , Humanos , Internet , Isoformas de ARN/metabolismo , Análisis de Secuencia de ARN , Transcriptoma , Interfaz Usuario-Computador
19.
Dev Biol ; 386(2): 461-72, 2014 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-24360906

RESUMEN

Neural crest development is orchestrated by a complex and still poorly understood gene regulatory network. Premigratory neural crest is induced at the lateral border of the neural plate by the combined action of signaling molecules and transcription factors such as AP2, Gbx2, Pax3 and Zic1. Among them, Pax3 and Zic1 are both necessary and sufficient to trigger a complete neural crest developmental program. However, their gene targets in the neural crest regulatory network remain unknown. Here, through a transcriptome analysis of frog microdissected neural border, we identified an extended gene signature for the premigratory neural crest, and we defined novel potential members of the regulatory network. This signature includes 34 novel genes, as well as 44 known genes expressed at the neural border. Using another microarray analysis which combined Pax3 and Zic1 gain-of-function and protein translation blockade, we uncovered 25 Pax3 and Zic1 direct targets within this signature. We demonstrated that the neural border specifiers Pax3 and Zic1 are direct upstream regulators of neural crest specifiers Snail1/2, Foxd3, Twist1, and Tfap2b. In addition, they may modulate the transcriptional output of multiple signaling pathways involved in neural crest development (Wnt, Retinoic Acid) through the induction of key pathway regulators (Axin2 and Cyp26c1). We also found that Pax3 could maintain its own expression through a positive autoregulatory feedback loop. These hierarchical inductions, feedback loops, and pathway modulations provide novel tools to understand the neural crest induction network.


Asunto(s)
Regulación del Desarrollo de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Cresta Neural/embriología , Factores de Transcripción Paired Box/metabolismo , Factores de Transcripción/metabolismo , Proteínas de Xenopus/metabolismo , Xenopus laevis/embriología , Animales , Ensayo de Cambio de Movilidad Electroforética , Regulación del Desarrollo de la Expresión Génica/fisiología , Redes Reguladoras de Genes/fisiología , Hibridación in Situ , Análisis por Micromatrices , Factor de Transcripción PAX3 , Reacción en Cadena en Tiempo Real de la Polimerasa , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Xenopus laevis/genética
20.
BMC Genomics ; 16: 873, 2015 Oct 28.
Artículo en Inglés | MEDLINE | ID: mdl-26510534

RESUMEN

BACKGROUND: Methylation of high-density CpG regions known as CpG Islands (CGIs) has been widely described as a mechanism associated with gene expression regulation. Aberrant promoter methylation is considered a hallmark of cancer involved in silencing of tumor suppressor genes and activation of oncogenes. However, recent studies have also challenged the simple model of gene expression control by promoter methylation in cancer, and the precise mechanism of and role played by changes in DNA methylation in carcinogenesis remains elusive. RESULTS: Using a large dataset of 672 matched cancerous and healthy methylomes, gene expression, and copy number profiles accross 3 types of tissues from The Cancer Genome Atlas (TCGA), we perform a detailed meta-analysis to clarify the interplay between promoter methylation and gene expression in normal and cancer samples. On the one hand, we recover the existence of a CpG island methylator phenotype (CIMP) with prognostic value in a subset of breast, colon and lung cancer samples, where a common subset of promoter CGIs hypomethylated in normal samples become hypermethylated. However, this hypermethylation is not accompanied by a decrease in expression of the corresponding genes, which are already lowly expressed in the normal genes. On the other hand, we identify tissue-specific sets of genes, different between normal and cancer samples, whose inter-individual variation in expression is significantly correlated with the variation in methylation of the 3' flanking regions of the promoter CGIs. These subsets of genes are not the same in the different tissues, nor between normal and cancerous samples, but transcription factors are over-represented in all subsets. CONCLUSION: Our results suggest that epigenetic reprogramming in cancer does not contribute to cancer development via direct inhibition of gene expression through promoter hypermethylation. It may instead modify how the expression of a few specific genes, particularly transcription factors, are associated with DNA methylation variations in a tissue-dependent manner.


Asunto(s)
Metilación de ADN/genética , Regulación Neoplásica de la Expresión Génica , Neoplasias/genética , Regiones Promotoras Genéticas/genética , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA