Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Bioinformatics ; 2024 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-38913855

RESUMO

MOTIVATIONS: Gene Regulatory Networks (GRNs) are traditionally inferred from gene expression profiles monitoring a specific condition or treatment. In the last decade, integrative strategies have successfully emerged to guide GRN inference from gene expression with complementary prior data. However, datasets used as prior information and validation gold standards are often related and limited to a subset of genes. This lack of complete and independent evaluation calls for new criteria to robustly estimate the optimal intensity of prior data integration in the inference process. RESULTS: We address this issue for two regression-based GRN inference models, a weighted Random Forest (weigthedRF) and a generalized linear model estimated under a weighted LASSO penalty with stability selection (weightedLASSO). These approaches are applied to data from the root response to nitrate induction in Arabidopsis thaliana. For each gene, we measure how the integration of transcription factor binding motifs influences model prediction. We propose a new approach, DIOgene, that uses model prediction error and a simulated null hypothesis in order to optimize data integration strength in a hypothesis-driven, gene-specific manner. This integration scheme reveals a strong diversity of optimal integration intensities between genes, and offers good performance in minimizing prediction error as well as retrieving experimental interactions. Experimental results show that DIOgene compares favorably against state-of-the-art approaches and allows to recover master regulators of nitrate induction. AVAILABILITY AND IMPLEMENTATION: The R code and notebooks demonstrating the use of the proposed approaches are available in the repository https://github.com/OceaneCsn/integrative_GRN_N_induction.

2.
BMC Cancer ; 24(1): 587, 2024 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-38741073

RESUMO

YAP and TAZ, the Hippo pathway terminal transcriptional activators, are frequently upregulated in cancers. In tumor cells, they have been mainly associated with increased tumorigenesis controlling different aspects from cell cycle regulation, stemness, or resistance to chemotherapies. In fewer cases, they have also been shown to oppose cancer progression, including by promoting cell death through the action of the p73/YAP transcriptional complex, in particular after chemotherapeutic drug exposure. Using HCT116 cells, we show here that oxaliplatin treatment led to core Hippo pathway down-regulation and nuclear accumulation of TAZ. We further show that TAZ was required for the increased sensitivity of HCT116 cells to oxaliplatin, an effect that appeared independent of p73, but which required the nuclear relocalization of TAZ. Accordingly, Verteporfin and CA3, two drugs affecting the activity of YAP and TAZ, showed antagonistic effects with oxaliplatin in co-treatments. Importantly, using several colorectal cell lines, we show that the sensitizing action of TAZ to oxaliplatin is dependent on the p53 status of the cells. Our results support thus an early action of TAZ to sensitize cells to oxaliplatin, consistent with a model in which nuclear TAZ in the context of DNA damage and p53 activity pushes cells towards apoptosis.


Assuntos
Antineoplásicos , Neoplasias do Colo , Via de Sinalização Hippo , Compostos Organoplatínicos , Oxaliplatina , Proteínas Serina-Treonina Quinases , Transdução de Sinais , Transativadores , Fatores de Transcrição , Proteínas com Motivo de Ligação a PDZ com Coativador Transcricional , Proteína Supressora de Tumor p53 , Humanos , Oxaliplatina/farmacologia , Proteína Supressora de Tumor p53/metabolismo , Proteína Supressora de Tumor p53/genética , Neoplasias do Colo/tratamento farmacológico , Neoplasias do Colo/metabolismo , Neoplasias do Colo/patologia , Neoplasias do Colo/genética , Transativadores/metabolismo , Transativadores/genética , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Células HCT116 , Transdução de Sinais/efeitos dos fármacos , Proteínas Serina-Treonina Quinases/metabolismo , Proteínas Serina-Treonina Quinases/genética , Compostos Organoplatínicos/farmacologia , Compostos Organoplatínicos/uso terapêutico , Antineoplásicos/farmacologia , Peptídeos e Proteínas de Sinalização Intracelular/metabolismo , Peptídeos e Proteínas de Sinalização Intracelular/genética , Resistencia a Medicamentos Antineoplásicos/genética , Proteínas Supressoras de Tumor/metabolismo , Proteínas Supressoras de Tumor/genética , Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Proteínas Adaptadoras de Transdução de Sinal/genética , Verteporfina/farmacologia , Verteporfina/uso terapêutico , Linhagem Celular Tumoral , Proteína Tumoral p73/metabolismo , Proteína Tumoral p73/genética , Proteínas de Sinalização YAP/metabolismo , Porfirinas/farmacologia , Proteínas Nucleares/metabolismo , Proteínas Nucleares/genética , Proteínas de Ligação a DNA/metabolismo , Proteínas de Ligação a DNA/genética , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Apoptose/efeitos dos fármacos
3.
Nucleic Acids Res ; 49(5): 2488-2508, 2021 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-33533919

RESUMO

The ubiquitous family of dimeric transcription factors AP-1 is made up of Fos and Jun family proteins. It has long been thought to operate principally at gene promoters and how it controls transcription is still ill-understood. The Fos family protein Fra-1 is overexpressed in triple negative breast cancers (TNBCs) where it contributes to tumor aggressiveness. To address its transcriptional actions in TNBCs, we combined transcriptomics, ChIP-seqs, machine learning and NG Capture-C. Additionally, we studied its Fos family kin Fra-2 also expressed in TNBCs, albeit much less. Consistently with their pleiotropic effects, Fra-1 and Fra-2 up- and downregulate individually, together or redundantly many genes associated with a wide range of biological processes. Target gene regulation is principally due to binding of Fra-1 and Fra-2 at regulatory elements located distantly from cognate promoters where Fra-1 modulates the recruitment of the transcriptional co-regulator p300/CBP and where differences in AP-1 variant motif recognition can underlie preferential Fra-1- or Fra-2 bindings. Our work also shows no major role for Fra-1 in chromatin architecture control at target gene loci, but suggests collaboration between Fra-1-bound and -unbound enhancers within chromatin hubs sometimes including promoters for other Fra-1-regulated genes. Our work impacts our view of AP-1.


Assuntos
Elementos Facilitadores Genéticos , Regulação Neoplásica da Expressão Gênica , Proteínas Proto-Oncogênicas c-fos/metabolismo , Neoplasias de Mama Triplo Negativas/genética , Sítios de Ligação , Linhagem Celular Tumoral , Cromatina/química , Cromatina/metabolismo , Epigênese Genética , Antígeno 2 Relacionado a Fos/metabolismo , Humanos , Motivos de Nucleotídeos , Regiões Promotoras Genéticas , Proteínas Proto-Oncogênicas c-fos/fisiologia , Fator de Transcrição AP-1/metabolismo , Neoplasias de Mama Triplo Negativas/metabolismo , Fatores de Transcrição de p300-CBP/metabolismo
4.
PLoS Comput Biol ; 17(4): e1008909, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33861755

RESUMO

Long regulatory elements (LREs), such as CpG islands, polydA:dT tracts or AU-rich elements, are thought to play key roles in gene regulation but, as opposed to conventional binding sites of transcription factors, few methods have been proposed to formally and automatically characterize them. We present here a computational approach named DExTER (Domain Exploration To Explain gene Regulation) dedicated to the identification of candidate LREs (cLREs) and apply it to the analysis of the genomes of P. falciparum and other eukaryotes. Our analyses show that all tested genomes contain several cLREs that are somewhat conserved along evolution, and that gene expression can be predicted with surprising accuracy on the basis of these long regions only. Regulation by cLREs exhibits very different behaviours depending on species and conditions. In P. falciparum and other Apicomplexan organisms as well as in Dictyostelium discoideum, the process appears highly dynamic, with different cLREs involved at different phases of the life cycle. For multicellular organisms, the same cLREs are involved in all tissues, but a dynamic behavior is observed along embryonic development stages. In P. falciparum, whose genome is known to be strongly depleted of transcription factors, cLREs are predictive of expression with an accuracy above 70%, and our analyses show that they are associated with both transcriptional and post-transcriptional regulation signals. Moreover, we assessed the biological relevance of one LRE discovered by DExTER in P. falciparum using an in vivo reporter assay. The source code (python) of DExTER is available at https://gite.lirmm.fr/menichelli/DExTER.


Assuntos
Genoma de Protozoário , Plasmodium falciparum/genética , Sequências Reguladoras de Ácido Nucleico , Eucariotos/genética , Regulação da Expressão Gênica , Ontologia Genética , Genes Reporter , Histonas/metabolismo , Processamento Pós-Transcricional do RNA , RNA Antissenso/genética , RNA Mensageiro/genética , Transcrição Gênica
5.
BMC Genomics ; 20(1): 103, 2019 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-30709337

RESUMO

BACKGROUND: In eukaryotic cells, transcription factors (TFs) are thought to act in a combinatorial way, by competing and collaborating to regulate common target genes. However, several questions remain regarding the conservation of these combinations among different gene classes, regulatory regions and cell types. RESULTS: We propose a new approach named TFcoop to infer the TF combinations involved in the binding of a target TF in a particular cell type. TFcoop aims to predict the binding sites of the target TF upon the nucleotide content of the sequences and of the binding affinity of all identified cooperating TFs. The set of cooperating TFs and model parameters are learned from ChIP-seq data of the target TF. We used TFcoop to investigate the TF combinations involved in the binding of 106 TFs on 41 cell types and in four regulatory regions: promoters of mRNAs, lncRNAs and pri-miRNAs, and enhancers. We first assess that TFcoop is accurate and outperforms simple PWM methods for predicting TF binding sites. Next, analysis of the learned models sheds light on important properties of TF combinations in different promoter classes and in enhancers. First, we show that combinations governing TF binding on enhancers are more cell-type specific than that governing binding in promoters. Second, for a given TF and cell type, we observe that TF combinations are different between promoters and enhancers, but similar for promoters of mRNAs, lncRNAs and pri-miRNAs. Analysis of the TFs cooperating with the different targets show over-representation of pioneer TFs and a clear preference for TFs with binding motif composition similar to that of the target. Lastly, our models accurately distinguish promoters associated with specific biological processes. CONCLUSIONS: TFcoop appears as an accurate approach for studying TF combinations. Its use on ENCODE and FANTOM data allowed us to discover important properties of human TF combinations in different promoter classes and in enhancers. The R code for learning a TFcoop model and for reproducing the main experiments described in the paper is available in an R Markdown file at address https://gite.lirmm.fr/brehelin/TFcoop .


Assuntos
Biologia Computacional/métodos , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo , Sítios de Ligação , Humanos , Fatores de Transcrição/genética
6.
PLoS Comput Biol ; 14(1): e1005889, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29293498

RESUMO

Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence.


Assuntos
Proteínas/química , Proteínas/genética , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Biologia Computacional , Bases de Dados de Proteínas , Plasmodium falciparum/química , Plasmodium falciparum/genética , Domínios Proteicos , Proteínas de Protozoários/química , Proteínas de Protozoários/genética , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de Proteína/estatística & dados numéricos
7.
PLoS Comput Biol ; 14(1): e1005921, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29293496

RESUMO

Gene expression is orchestrated by distinct regulatory regions to ensure a wide variety of cell types and functions. A challenge is to identify which regulatory regions are active, what are their associated features and how they work together in each cell type. Several approaches have tackled this problem by modeling gene expression based on epigenetic marks, with the ultimate goal of identifying driving regions and associated genomic variations that are clinically relevant in particular in precision medicine. However, these models rely on experimental data, which are limited to specific samples (even often to cell lines) and cannot be generated for all regulators and all patients. In addition, we show here that, although these approaches are accurate in predicting gene expression, inference of TF combinations from this type of models is not straightforward. Furthermore these methods are not designed to capture regulation instructions present at the sequence level, before the binding of regulators or the opening of the chromatin. Here, we probe sequence-level instructions for gene expression and develop a method to explain mRNA levels based solely on nucleotide features. Our method positions nucleotide composition as a critical component of gene expression. Moreover, our approach, able to rank regulatory regions according to their contribution, unveils a strong influence of the gene body sequence, in particular introns. We further provide evidence that the contribution of nucleotide content can be linked to co-regulations associated with genome 3D architecture and to associations of genes within topologically associated domains.


Assuntos
Composição de Bases , Regulação da Expressão Gênica , Sequências Reguladoras de Ácido Nucleico , Biologia Computacional , Variações do Número de Cópias de DNA , Elementos Facilitadores Genéticos , Genoma Humano , Humanos , Modelos Genéticos , Neoplasias/genética , Neoplasias/metabolismo , Polimorfismo de Nucleotídeo Único , Regiões Promotoras Genéticas , Locos de Características Quantitativas , RNA Mensageiro/química , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
8.
Nat Chem Biol ; 7(11): 834-42, 2011 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-21946275

RESUMO

Monogalactosyldiacylglycerol (MGDG) and digalactosyldiacylglycerol (DGDG) are the main lipids in photosynthetic membranes in plant cells. They are synthesized in the envelope surrounding plastids by MGD and DGD galactosyltransferases. These galactolipids are critical for the biogenesis of photosynthetic membranes, and they act as a source of polyunsaturated fatty acids for the whole cell and as phospholipid surrogates in phosphate shortage. Based on a high-throughput chemical screen, we have characterized a new compound, galvestine-1, that inhibits MGDs in vitro by competing with diacylglycerol binding. Consistent effects of galvestine-1 on Arabidopsis thaliana include root uptake, circulation in the xylem and mesophyll, inhibition of MGDs in vivo causing a reduction of MGDG content and impairment of chloroplast development. The effects on pollen germination shed light on the contribution of galactolipids to pollen-tube elongation. The whole-genome transcriptional response of Arabidopsis points to the potential benefits of galvestine-1 as a unique tool to study lipid homeostasis in plants.


Assuntos
Arabidopsis/enzimologia , Galactosiltransferases/antagonistas & inibidores , Regulação Enzimológica da Expressão Gênica/efeitos dos fármacos , Regulação da Expressão Gênica de Plantas/efeitos dos fármacos , Inibidores Enzimáticos/farmacologia , Galactolipídeos/metabolismo , Perfilação da Expressão Gênica , Estrutura Molecular , Piperidinas/farmacologia , Folhas de Planta/ultraestrutura , Raízes de Plantas/metabolismo , Bibliotecas de Moléculas Pequenas , Relação Estrutura-Atividade
9.
BMC Bioinformatics ; 13: 67, 2012 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-22548871

RESUMO

BACKGROUND: Hidden Markov Models (HMMs) are a powerful tool for protein domain identification. The Pfam database notably provides a large collection of HMMs which are widely used for the annotation of proteins in new sequenced organisms. In Pfam, each domain family is represented by a curated multiple sequence alignment from which a profile HMM is built. In spite of their high specificity, HMMs may lack sensitivity when searching for domains in divergent organisms. This is particularly the case for species with a biased amino-acid composition, such as P. falciparum, the main causal agent of human malaria. In this context, fitting HMMs to the specificities of the target proteome can help identify additional domains. RESULTS: Using P. falciparum as an example, we compare approaches that have been proposed for this problem, and present two alternative methods. Because previous attempts strongly rely on known domain occurrences in the target species or its close relatives, they mainly improve the detection of domains which belong to already identified families. Our methods learn global correction rules that adjust amino-acid distributions associated with the match states of HMMs. These rules are applied to all match states of the whole HMM library, thus enabling the detection of domains from previously absent families. Additionally, we propose a procedure to estimate the proportion of false positives among the newly discovered domains. Starting with the Pfam standard library, we build several new libraries with the different HMM-fitting approaches. These libraries are first used to detect new domain occurrences with low E-values. Second, by applying the Co-Occurrence Domain Discovery (CODD) procedure we have recently proposed, the libraries are further used to identify likely occurrences among potential domains with higher E-values. CONCLUSION: We show that the new approaches allow identification of several domain families previously absent in the P. falciparum proteome and the Apicomplexa phylum, and identify many domains that are not detected by previous approaches. In terms of the number of new discovered domains, the new approaches outperform the previous ones when no close species are available or when they are used to identify likely occurrences among potential domains with high E-values. All predictions on P. falciparum have been integrated into a dedicated website which pools all known/new annotations of protein domains and functions for this organism. A software implementing the two proposed approaches is available at the same address: http://www.lirmm.fr/~terrapon/HMMfit/


Assuntos
Motivos de Aminoácidos , Cadeias de Markov , Plasmodium falciparum , Proteômica/métodos , Algoritmos , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Proteínas/análise , Proteoma/análise , Alinhamento de Sequência , Software
10.
Nucleic Acids Res ; 37(15): e104, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19531739

RESUMO

Ultra high-throughput sequencing is used to analyse the transcriptome or interactome at unprecedented depth on a genome-wide scale. These techniques yield short sequence reads that are then mapped on a genome sequence to predict putatively transcribed or protein-interacting regions. We argue that factors such as background distribution, sequence errors, and read length impact on the prediction capacity of sequence census experiments. Here we suggest a computational approach to measure these factors and analyse their influence on both transcriptomic and epigenomic assays. This investigation provides new clues on both methodological and biological issues. For instance, by analysing chromatin immunoprecipitation read sets, we estimate that 4.6% of reads are affected by SNPs. We show that, although the nucleotide error probability is low, it significantly increases with the position in the sequence. Choosing a read length above 19 bp practically eliminates the risk of finding irrelevant positions, while above 20 bp the number of uniquely mapped reads decreases. With our procedure, we obtain 0.6% false positives among genomic locations. Hence, even rare signatures should identify biologically relevant regions, if they are mapped on the genome. This indicates that digital transcriptomics may help to characterize the wealth of yet undiscovered, low-abundance transcripts.


Assuntos
Genômica/métodos , Imunoprecipitação da Cromatina , Mapeamento Cromossômico , Perfilação da Expressão Gênica , Genoma Humano , Humanos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA
11.
Nat Commun ; 12(1): 3297, 2021 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-34078885

RESUMO

Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.


Assuntos
Repetições de Microssatélites , Redes Neurais de Computação , Doenças Neurodegenerativas/genética , Sítio de Iniciação de Transcrição , Iniciação da Transcrição Genética , Células A549 , Animais , Sequência de Bases , Biologia Computacional/métodos , Aprendizado Profundo , Elementos Facilitadores Genéticos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , Doenças Neurodegenerativas/diagnóstico , Doenças Neurodegenerativas/metabolismo , Polimorfismo Genético , Regiões Promotoras Genéticas
12.
BMC Genomics ; 11: 35, 2010 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-20078859

RESUMO

BACKGROUND: Plasmodium falciparum is the main causative agent of malaria. Of the 5 484 predicted genes of P. falciparum, about 57% do not have sufficient sequence similarity to characterized genes in other species to warrant functional assignments. Non-homology methods are thus needed to obtain functional clues for these uncharacterized genes. Gene expression data have been widely used in the recent years to help functional annotation in an intra-species way via the so-called Guilt By Association (GBA) principle. RESULTS: We propose a new method that uses gene expression data to assess inter-species annotation transfers. Our approach starts from a set of likely orthologs between a reference species (here S. cerevisiae and D. melanogaster) and a query species (P. falciparum). It aims at identifying clusters of coexpressed genes in the query species whose coexpression has been conserved in the reference species. These conserved clusters of coexpressed genes are then used to assess annotation transfers between genes with low sequence similarity, enabling reliable transfers of annotations from the reference to the query species. The approach was used with transcriptomic data sets of P. falciparum, S. cerevisiae and D. melanogaster, and enabled us to propose with high confidence new/refined annotations for several dozens hypothetical/putative P. falciparum genes. Notably, we revised the annotation of genes involved in ribosomal proteins and ribosome biogenesis and assembly, thus highlighting several potential drug targets. CONCLUSIONS: Our approach uses both sequence similarity and gene expression data to help inter-species gene annotation transfers. Experiments show that this strategy improves the accuracy achieved when using solely sequence similarity and outperforms the accuracy of the GBA approach. In addition, our experiments with P. falciparum show that it can infer a function for numerous hypothetical genes.


Assuntos
Genoma de Protozoário , Modelos Genéticos , Plasmodium falciparum/genética , Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , Sequência Conservada , Expressão Gênica , Perfilação da Expressão Gênica , Proteínas de Protozoários/genética , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA/métodos
13.
BMC Genomics ; 10: 235, 2009 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-19454033

RESUMO

BACKGROUND: The Plasmodium falciparum genome (3D7 strain) published in 2002, revealed ~5,400 genes, mostly based on in silico predictions. Experimental data is therefore required for structural and functional assessments of P. falciparum genes and expression, and polymorphic data are further necessary to exploit genomic information to further qualify therapeutic target candidates. Here, we undertook a large scale analysis of a P. falciparum FcB1-schizont-EST library previously constructed by suppression subtractive hybridization (SSH) to study genes expressed during merozoite morphogenesis, with the aim of: 1) obtaining an exhaustive collection of schizont specific ESTs, 2) experimentally validating or correcting P. falciparum gene models and 3) pinpointing genes displaying protein polymorphism between the FcB1 and 3D7 strains. RESULTS: A total of 22,125 clones randomly picked from the SSH library were sequenced, yielding 21,805 usable ESTs that were then clustered on the P. falciparum genome. This allowed identification of 243 protein coding genes, including 121 previously annotated as hypothetical. Statistical analysis of GO terms, when available, indicated significant enrichment in genes involved in "entry into host-cells" and "actin cytoskeleton". Although most ESTs do not span full-length gene reading frames, detailed sequence comparison of FcB1-ESTs versus 3D7 genomic sequences allowed the confirmation of exon/intron boundaries in 29 genes, the detection of new boundaries in 14 genes and identification of protein polymorphism for 21 genes. In addition, a large number of non-protein coding ESTs were identified, mainly matching with the two A-type rRNA units (on chromosomes 5 and 7) and to a lower extent, two atypical rRNA loci (on chromosomes 1 and 8), TARE subtelomeric regions (several chromosomes) and the recently described telomerase RNA gene (chromosome 9). CONCLUSION: This FcB1-schizont-EST analysis confirmed the actual expression of 243 protein coding genes, allowing the correction of structural annotations for a quarter of these sequences. In addition, this analysis demonstrated the actual transcription of several remarkable non-protein coding loci: 2 atypical rRNA, TARE region and telomerase RNA gene. Together with other collections of P. falciparum ESTs, usually generated from mixed parasite stages, this collection of FcB1-schizont-ESTs provides valuable data to gain further insight into the P. falciparum gene structure, polymorphism and expression.


Assuntos
Etiquetas de Sequências Expressas , Genoma de Protozoário , Plasmodium falciparum/genética , Animais , Éxons , Biblioteca Gênica , Genes de Protozoários , Íntrons , Modelos Genéticos , Dados de Sequência Molecular , Polimorfismo Genético , Proteínas de Protozoários/genética , RNA de Protozoário/genética , RNA Ribossômico/genética , Esquizontes/metabolismo , Alinhamento de Sequência , Análise de Sequência de DNA
14.
Bioinformatics ; 24(5): 682-8, 2008 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-18204054

RESUMO

MOTIVATION: Hierarchical clustering is a common approach to study protein and gene expression data. This unsupervised technique is used to find clusters of genes or proteins which are expressed in a coordinated manner across a set of conditions. Because of both the biological and technical variability, experimental repetitions are generally performed. In this work, we propose an approach to evaluate the stability of clusters derived from hierarchical clustering by taking repeated measurements into account. RESULTS: The method is based on the bootstrap technique that is used to obtain pseudo-hierarchies of genes from resampled datasets. Based on a fast dynamic programming algorithm, we compare the original hierarchy to the pseudo-hierarchies and assess the stability of the original gene clusters. Then a shuffling procedure can be used to assess the significance of the cluster stabilities. Our approach is illustrated on simulated data and on two microarray datasets. Compared to the standard hierarchical clustering methodology, it allows to point out the dubious and stable clusters, and thus avoids misleading interpretations. AVAILABILITY: The programs were developed in C and R languages.


Assuntos
Família Multigênica , Animais , Bases de Dados Genéticas , Humanos , Ligação Proteica , Proteínas/química , Proteínas/genética , Proteínas/isolamento & purificação
15.
BMC Bioinformatics ; 9: 440, 2008 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-18925948

RESUMO

BACKGROUND: Of the 5,484 predicted proteins of Plasmodium falciparum, the main causative agent of malaria, about 60% do not have sufficient sequence similarity with proteins in other organisms to warrant provision of functional assignments. Non-homology methods are thus needed to obtain functional clues for these uncharacterized genes. RESULTS: We present PlasmoDraft http://atgc.lirmm.fr/PlasmoDraft/, a database of Gene Ontology (GO) annotation predictions for P. falciparum genes based on postgenomic data. Predictions of PlasmoDraft are achieved with a Guilt By Association method named Gonna. This involves (1) a predictor that proposes GO annotations for a gene based on the similarity of its profile (measured with transcriptome, proteome or interactome data) with genes already annotated by GeneDB; (2) a procedure that estimates the confidence of the predictions achieved with each data source; (3) a procedure that combines all data sources to provide a global summary and confidence estimate of the predictions. Gonna has been applied to all P. falciparum genes using most publicly available transcriptome, proteome and interactome data sources. Gonna provides predictions for numerous genes without any annotations. For example, 2,434 genes without any annotations in the Biological Process ontology are associated with specific GO terms (e.g. Rosetting, Antigenic variation), and among these, 841 have confidence values above 50%. In the Cellular Component and Molecular Function ontologies, 1,905 and 1,540 uncharacterized genes are associated with specific GO terms, respectively (740 and 329 with confidence value above 50%). CONCLUSION: All predictions along with their confidence values have been compiled in PlasmoDraft, which thus provides an extensive database of GO annotation predictions that can be achieved with these data sources. The database can be accessed in different ways. A global view allows for a quick inspection of the GO terms that are predicted with high confidence, depending on the various data sources. A gene view and a GO term view allow for the search of potential GO terms attached to a given gene, and genes that potentially belong to a given GO term.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Genes de Protozoários , Genômica/métodos , Plasmodium falciparum/genética , Algoritmos , Animais , Inteligência Artificial , Expressão Gênica , Humanos , Armazenamento e Recuperação da Informação , Proteínas de Protozoários/genética , Proteínas de Protozoários/metabolismo
17.
PLoS One ; 9(6): e95275, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24901648

RESUMO

Identification of protein domains is a key step for understanding protein function. Hidden Markov Models (HMMs) have proved to be a powerful tool for this task. The Pfam database notably provides a large collection of HMMs which are widely used for the annotation of proteins in sequenced organisms. This is done via sequence/HMM comparisons. However, this approach may lack sensitivity when searching for domains in divergent species. Recently, methods for HMM/HMM comparisons have been proposed and proved to be more sensitive than sequence/HMM approaches in certain cases. However, these approaches are usually not used for protein domain discovery at a genome scale, and the benefit that could be expected from their utilization for this problem has not been investigated. Using proteins of P. falciparum and L. major as examples, we investigate the extent to which HMM/HMM comparisons can identify new domain occurrences not already identified by sequence/HMM approaches. We show that although HMM/HMM comparisons are much more sensitive than sequence/HMM comparisons, they are not sufficiently accurate to be used as a standalone complement of sequence/HMM approaches at the genome scale. Hence, we propose to use domain co-occurrence--the general domain tendency to preferentially appear along with some favorite domains in the proteins--to improve the accuracy of the approach. We show that the combination of HMM/HMM comparisons and co-occurrence domain detection boosts protein annotations. At an estimated False Discovery Rate of 5%, it revealed 901 and 1098 new domains in Plasmodium and Leishmania proteins, respectively. Manual inspection of part of these predictions shows that it contains several domain families that were missing in the two organisms. All new domain occurrences have been integrated in the EuPathDomains database, along with the GO annotations that can be deduced.


Assuntos
Biologia Computacional , Cadeias de Markov , Domínios e Motivos de Interação entre Proteínas , Proteínas/química , Biologia Computacional/métodos , Anotação de Sequência Molecular , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
18.
Genome Biol ; 13(11): R109, 2012 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-23186104

RESUMO

Approaches for regulatory element discovery from gene expression data usually rely on clustering algorithms to partition the data into clusters of co-expressed genes. Gene regulatory sequences are then mined to find overrepresented motifs in each cluster. However, this ad hoc partition rarely fits the biological reality. We propose a novel method called RED2 that avoids data clustering by estimating motif densities locally around each gene. We show that RED2 detects numerous motifs not detected by clustering-based approaches, and that most of these correspond to characterized motifs. RED2 can be accessed online through a user-friendly interface.


Assuntos
Biologia Computacional/métodos , Elementos Reguladores de Transcrição , Algoritmos , Análise por Conglomerados , Expressão Gênica
19.
Mol Biosyst ; 8(8): 2023-35, 2014, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22592295

RESUMO

Plant cells are characterized by the presence of chloroplasts, membrane lipids of which contain up to ∼80% mono- and digalactosyldiacylglycerol (MGDG and DGDG). The synthesis of MGDG in the chloroplast envelope is essential for the biogenesis and function of photosynthetic membranes, is coordinated with lipid metabolism in other cell compartments and is regulated in response to environmental factors. Phenotypic analyses of Arabidopsis using the recently developed specific inhibitor called galvestine-1 complete previous analyses performed using various approaches, from enzymology, cell biology to genetics. This review details how this probe could be beneficial to study the lipid homeostasis system at the whole cell level and highlights connections between MGDG synthesis and Arabidopsis flower development.


Assuntos
Glicerídeos/metabolismo , Piperidinas/farmacologia , Células Vegetais/metabolismo , Arabidopsis/efeitos dos fármacos , Arabidopsis/metabolismo , Galactolipídeos/metabolismo , Homeostase , Metabolismo dos Lipídeos/efeitos dos fármacos , Células Vegetais/efeitos dos fármacos
20.
Infect Genet Evol ; 11(4): 698-707, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-20920608

RESUMO

Eukaryotic pathogens (e.g. Plasmodium, Leishmania, Trypanosomes, etc.) are a major source of morbidity and mortality worldwide. In Africa, one of the most impacted continents, they cause millions of deaths and constitute an immense economic burden. While the genome sequence of several of these organisms is now available, the biological functions of more than half of their proteins are still unknown. This is a serious issue for bringing to the foreground the expected new therapeutic targets. In this context, the identification of protein domains is a key step to improve the functional annotation of the proteins. However, several domains are missed in eukaryotic pathogens because of the high phylogenetic distance of these organisms from the classical eukaryote models. We recently proposed a method, co-occurrence domain detection (CODD), that improves the sensitivity of Pfam domain detection by exploiting the tendency of domains to appear preferentially with a few other favorite domains in a protein. In this paper, we present EuPathDomains (http://www.atgc-montpellier.fr/EuPathDomains/), an extended database of protein domains belonging to ten major eukaryotic human pathogens. EuPathDomains gathers known and new domains detected by CODD, along with the associated confidence measurements and the GO annotations that can be deduced from the new domains. This database significantly extends the Pfam domain coverage of all selected genomes, by proposing new occurrences of domains as well as new domain families that have never been reported before. For example, with a false discovery rate lower than 20%, EuPathDomains increases the number of detected domains by 13% in Toxoplasma gondii genome and up to 28% in Cryptospordium parvum, and the total number of domain families by 10% in Plasmodium falciparum and up to 16% in C. parvum genome. The database can be queried by protein names, domain identifiers, Pfam or Interpro identifiers, or organisms, and should become a valuable resource to decipher the protein functions of eukaryotic pathogens.


Assuntos
Bases de Dados de Proteínas , Eucariotos/genética , Domínios e Motivos de Interação entre Proteínas/genética , Proteínas de Protozoários/genética , Biologia Computacional , Cryptosporidium parvum/genética , Eucariotos/metabolismo , Giardia lamblia/genética , Humanos , Leishmania/genética , Anotação de Sequência Molecular , Plasmodium/genética , Ligação Proteica , Proteínas de Protozoários/química , Proteínas de Protozoários/metabolismo , Toxoplasma/genética , Trypanosoma brucei brucei/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA