Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
BMC Bioinformatics ; 16: 179, 2015 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-26022464

RESUMEN

BACKGROUND: Several methods exist for the prediction of precursor miRNAs (pre-miRNAs) in genomic or sRNA-seq (small RNA sequences) data produced by NGS (Next Generation Sequencing). One key information used for this task is the characteristic hairpin structure adopted by pre-miRNAs, that in general are identified using RNA folders whose complexity is cubic in the size of the input. The vast majority of pre-miRNA predictors then rely on further information learned from previously validated miRNAs from the same or a closely related genome for the final prediction of new miRNAs. With this paper, we wished to address three main issues. The first was methodological and aimed at obtaining a more time-efficient predictor, however without losing in accuracy which represented a second issue. We indeed aimed at better predicting miRNAs at a genome scale, but also from sRNAseq data where in some cases, notably of plants, the current folding methods often infer the wrong structure. The third issue is related to the fact that it is important to rely as little as possible on previously recorded examples of miRNAs. We therefore also sought a method that is less dependent on previous miRNA records. RESULTS: As concerns the first and second issues, we present a novel alternative to a classical folder based on a thermodynamic Nearest-Neighbour (NN) model for computing the free energy and predicting the classical hairpin structure of a pre-miRNA. We show that the free energies thus computed correlate well with those of RNAFOLD. This novel method, called MIRINHO, has quadratic instead of cubic complexity and is much more efficient also in practice. When applied to sRNAseq data of plants, it gives in general better results than classical folders. On the third issue, we show that MIRINHO, which uses as only knowledge the length of the loops and stem-arms and the free energy of the pre-miRNA hairpin, compares well with algorithms that require more information. The results, obtained with different datasets, are indeed similar to those of other approaches with which such a comparison was possible. These needed to be publicly available softwares that could be used on a large input. In some cases, MIRINHO is even better in terms of sensitivity or precision. CONCLUSION: We provide a simpler and much faster method with very reasonable sensitivity and precision, which can be applied without special adaptation to the prediction of both animal and plant pre-miRNAs, using as input either genomic sequences or sRNA-seq data.


Asunto(s)
Arabidopsis/genética , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Insectos/genética , MicroARNs/genética , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Algoritmos , Animales , Emparejamiento Base , Secuencia de Bases , Genómica/métodos , Datos de Secuencia Molecular , Homología de Secuencia de Ácido Nucleico
2.
Nucleic Acids Res ; 39(Database issue): D569-75, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21081560

RESUMEN

Fast viral adaptation and the implication of this rapid evolution in the emergence of several new infectious diseases have turned this issue into a major challenge for various research domains. Indeed, viruses are involved in the development of a wide range of pathologies and understanding how viruses and host cells interact in the context of adaptation remains an open question. In order to provide insights into the complex interactions between viruses and their host organisms and namely in the acquisition of novel functions through exchanges of genetic material, we developed the PhEVER database. This database aims at providing accurate evolutionary and phylogenetic information to analyse the nature of virus-virus and virus-host lateral gene transfers. PhEVER (http://pbil.univ-lyon1.fr/databases/phever) is a unique database of homologous families both (i) between sequences from different viruses and (ii) between viral sequences and sequences from cellular organisms. PhEVER integrates extensive data from up-to-date completely sequenced genomes (2426 non-redundant viral genomes, 1007 non-redundant prokaryotic genomes, 43 eukaryotic genomes ranging from plants to vertebrates) and offers a clustering of proteins into homologous families containing at least one viral sequences, as well as alignments and phylogenies for each of these families. Public access to PhEVER is available through its webpage and through all dedicated ACNUC retrieval systems.


Asunto(s)
Bases de Datos Genéticas , Evolución Molecular , Interacciones Huésped-Patógeno/genética , Virus/genética , Análisis por Conglomerados , Transferencia de Gen Horizontal , Genes Virales , Genoma Viral , Genómica , Filogenia , Homología de Secuencia , Interfaz Usuario-Computador , Proteínas Virales/química , Proteínas Virales/clasificación , Proteínas Virales/genética , Virus/clasificación
3.
BMC Genomics ; 13: 438, 2012 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-22938206

RESUMEN

BACKGROUND: A large number of genome-scale metabolic networks is now available for many organisms, mostly bacteria. Previous works on minimal gene sets, when analysing host-dependent bacteria, found small common sets of metabolic genes. When such analyses are restricted to bacteria with similar lifestyles, larger portions of metabolism are expected to be shared and their composition is worth investigating. Here we report a comparative analysis of the small molecule metabolism of symbiotic bacteria, exploring common and variable portions as well as the contribution of different lifestyle groups to the reduction of a common set of metabolic capabilities. RESULTS: We found no reaction shared by all the bacteria analysed. Disregarding those with the smallest genomes, we still do not find a reaction core, however we did find a core of biochemical capabilities. While obligate intracellular symbionts have no core of reactions within their group, extracellular and cell-associated symbionts do have a small core composed of disconnected fragments. In agreement with previous findings in Escherichia coli, their cores are enriched in biosynthetic processes whereas the variable metabolisms have similar ratios of biosynthetic and degradation reactions. Conversely, the variable metabolism of obligate intracellular symbionts is enriched in anabolism. CONCLUSION: Even when removing the symbionts with the most reduced genomes, there is no core of reactions common to the analysed symbiotic bacteria. The main reason is the very high specialisation of obligate intracellular symbionts, however, host-dependence alone is not an explanation for such absence. The composition of the metabolism of cell-associated and extracellular bacteria shows that while they have similar needs in terms of the building blocks of their cells, they have to adapt to very distinct environments. On the other hand, in obligate intracellular bacteria, catabolism has largely disappeared, whereas synthetic routes appear to have been selected for depending on the nature of the symbiosis. As more genomes are added, we expect, based on our simulations, that the core of cell-associated and extracellular bacteria continues to diminish, converging to approximately 60 reactions.


Asunto(s)
Bacterias/genética , Bacterias/metabolismo , Evolución Molecular , Genoma Bacteriano/genética , Redes y Vías Metabólicas/genética , Simbiosis/genética , Modelos Genéticos , Especificidad de la Especie
4.
Hum Mutat ; 32(2): 198-206, 2011 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-21120948

RESUMEN

Although mutations that are detrimental to the fitness of organisms are expected to be rapidly purged from populations by natural selection, some disease-causing mutations are present at high frequencies in human populations. Several nonexclusive hypotheses have been proposed to account for this apparent paradox (high new mutation rate, genetic drift, overdominance, or recent changes in selective pressure). However, the factors ultimately responsible for the presence at high frequency of disease-causing mutations are still contentious. Here we establish the existence of an additional process that contributes to the spreading of deleterious mutations: GC-biased gene conversion (gBGC), a process associated with recombination that tends to favor the transmission of GC-alleles over AT-alleles. We show that the spectrum of amino acid-altering polymorphisms in human populations exhibits the footprints of gBGC. This pattern cannot be explained in terms of selection and is evident with all nonsynonymous mutations, including those predicted to be detrimental to protein structure and function, and those implicated in human genetic disease. We present simulations to illustrate the conditions under which gBGC can extend the persistence time of deleterious mutations in a finite population. These results indicate that gBGC meiotic drive contributes to the spreading of deleterious mutations in human populations.


Asunto(s)
Composición de Base , Predisposición Genética a la Enfermedad , Meiosis , Recombinación Genética , Enfermedad/genética , Frecuencia de los Genes , Humanos , Mutación , Polimorfismo de Nucleótido Simple
5.
BMC Genomics ; 12: 303, 2011 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-21663614

RESUMEN

BACKGROUND: Folding and intermingling of chromosomes has the potential of bringing close to each other loci that are very distant genomically or even on different chromosomes. On the other hand, genomic rearrangements also play a major role in the reorganisation of loci proximities. Whether the same loci are involved in both mechanisms has been studied in the case of somatic rearrangements, but never from an evolutionary standpoint. RESULTS: In this paper, we analysed the correlation between two datasets: (i) whole-genome chromatin contact data obtained in human cells using the Hi-C protocol; and (ii) a set of breakpoint regions resulting from evolutionary rearrangements which occurred since the split of the human and mouse lineages. Surprisingly, we found that two loci distant in the human genome but adjacent in the mouse genome are significantly more often observed in close proximity in the human nucleus than expected. Importantly, we show that this result holds for loci located on the same chromosome regardless of the genomic distance separating them, and the signal is stronger in gene-rich and open-chromatin regions. CONCLUSIONS: These findings strongly suggest that part of the 3D organisation of chromosomes may be conserved across very large evolutionary distances. To characterise this phenomenon, we propose to use the notion of spatial synteny which generalises the notion of genomic synteny to the 3D case.


Asunto(s)
Puntos de Rotura del Cromosoma , Evolución Molecular , Sintenía/genética , Animales , Cromatina/genética , Sitios Genéticos/genética , Genoma Humano/genética , Genómica , Humanos , Ratones
6.
Bioinformatics ; 26(15): 1897-8, 2010 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-20576622

RESUMEN

SUMMARY: Genomes undergo large structural changes that alter their organization. The chromosomal regions affected by these rearrangements are called breakpoints, while those which have not been rearranged are called synteny blocks. Lemaitre et al. presented a new method to precisely delimit rearrangement breakpoints in a genome by comparison with the genome of a related species. Receiving as input a list of one2one orthologous genes found in the genomes of two species, the method builds a set of reliable and non-overlapping synteny blocks and refines the regions that are not contained into them. Through the alignment of each breakpoint sequence against its specific orthologous sequences in the other species, we can look for weak similarities inside the breakpoint, thus extending the synteny blocks and narrowing the breakpoints. The identification of the narrowed breakpoints relies on a segmentation algorithm and is statistically assessed. Here, we present the package Cassis that implements this method of precise detection of genomic rearrangement breakpoints. AVAILABILITY: Perl and R scripts are freely available for download at http://pbil.univ-lyon1.fr/software/Cassis/. Documentation with methodological background, technical aspects, download and setup instructions, as well as examples of applications are available together with the package. The package was tested on Linux and Mac OS environments and is distributed under the GNU GPL License.


Asunto(s)
Puntos de Rotura del Cromosoma , Biología Computacional/métodos , Genoma/genética , Recombinación Genética , Programas Informáticos , Algoritmos , Cromosomas/genética , Sintenía
7.
Nucleic Acids Res ; 37(Database issue): D661-8, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18984613

RESUMEN

Infectious diseases caused by viral agents kill millions of people every year. The improvement of prevention and treatment of viral infections and their associated diseases remains one of the main public health challenges. Towards this goal, deciphering virus-host molecular interactions opens new perspectives to understand the biology of infection and for the design of new antiviral strategies. Indeed, modelling of an infection network between viral and cellular proteins will provide a conceptual and analytic framework to efficiently formulate new biological hypothesis at the proteome scale and to rationalize drug discovery. Therefore, we present the first release of VirHostNet (Virus-Host Network), a public knowledge base specialized in the management and analysis of integrated virus-virus, virus-host and host-host interaction networks coupled to their functional annotations. VirHostNet integrates an extensive and original literature-curated dataset of virus-virus and virus-host interactions (2671 non-redundant interactions) representing more than 180 distinct viral species and one of the largest human interactome (10,672 proteins and 68,252 non-redundant interactions) reconstructed from publicly available data. The VirHostNet Web interface provides appropriate tools that allow efficient query and visualization of this infected cellular network. Public access to the VirHostNet knowledge-based system is available at http://pbildb1.univ-lyon1.fr/virhostnet.


Asunto(s)
Bases de Datos de Proteínas , Interacciones Huésped-Patógeno , Mapeo de Interacción de Proteínas , Proteínas Virales/metabolismo , Internet , Proteoma/metabolismo , Interfaz Usuario-Computador , Virosis/metabolismo , Virosis/virología , Fenómenos Fisiológicos de los Virus
8.
BMC Genomics ; 11: 666, 2010 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-21108805

RESUMEN

BACKGROUND: Gene expression regulation is still poorly documented in bacteria with highly reduced genomes. Understanding the evolution and mechanisms underlying the regulation of gene transcription in Buchnera aphidicola, the primary endosymbiont of aphids, is expected both to enhance our understanding of this nutritionally based association and to provide an intriguing case-study of the evolution of gene expression regulation in a reduced bacterial genome. RESULTS: A Bayesian predictor was defined to infer the B. aphidicola transcription units, which were further validated using transcriptomic data and RT-PCR experiments. The characteristics of B. aphidicola predicted transcription units (TUs) were analyzed in order to evaluate the impact of operon map organization on the regulation of gene transcription.On average, B. aphidicola TUs contain more genes than those of E. coli. The global layout of B. aphidicola operon map was mainly shaped by the big reduction and the rearrangements events, which occurred at the early stage of the symbiosis. Our analysis suggests that this operon map may evolve further only by small reorganizations around the frontiers of B. aphidicola TUs, through promoter and/or terminator sequence modifications and/or by pseudogenization events. We also found that the need for specific transcription regulation exerts some pressure on gene conservation, but not on gene assembling in the operon map in Buchnera. Our analysis of the TUs spacing pointed out that a selection pressure is maintained on the length of the intergenic regions between divergent adjacent gene pairs. CONCLUSIONS: B. aphidicola can seemingly only evolve towards a more polycistronic operon map. This implies that gene transcription regulation is probably subject to weak selection pressure in Buchnera conserving operons composed of genes with unrelated functions.


Asunto(s)
Buchnera/genética , Operón/genética , Secuencia de Bases , Codón/genética , Secuencia Conservada/genética , ADN Intergénico/genética , Escherichia coli/genética , Evolución Molecular , Regulación Bacteriana de la Expresión Génica , Genes Bacterianos , Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos , Sistemas de Lectura Abierta/genética , Regiones Promotoras Genéticas/genética , Curva ROC , Reproducibilidad de los Resultados , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Regiones Terminadoras Genéticas/genética , Transcripción Genética
9.
BMC Genomics ; 11: 344, 2010 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-20509979

RESUMEN

BACKGROUND: Recent developments in high-throughput methods of analyzing transcriptomic profiles are promising for many areas of biology, including ecophysiology. However, although commercial microarrays are available for most common laboratory models, transcriptome analysis in non-traditional model species still remains a challenge. Indeed, the signal resulting from heterologous hybridization is low and difficult to interpret because of the weak complementarity between probe and target sequences, especially when no microarray dedicated to a genetically close species is available. RESULTS: We show here that transcriptome analysis in a species genetically distant from laboratory models is made possible by using MAXRS, a new method of analyzing heterologous hybridization on microarrays. This method takes advantage of the design of several commercial microarrays, with different probes targeting the same transcript. To illustrate and test this method, we analyzed the transcriptome of king penguin pectoralis muscle hybridized to Affymetrix chicken microarrays, two organisms separated by an evolutionary distance of approximately 100 million years. The differential gene expression observed between different physiological situations computed by MAXRS was confirmed by real-time PCR on 10 genes out of 11 tested. CONCLUSIONS: MAXRS appears to be an appropriate method for gene expression analysis under heterologous hybridization conditions.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Hibridación de Ácido Nucleico/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Animales , Regulación del Desarrollo de la Expresión Génica , Océanos y Mares , Músculos Pectorales/crecimiento & desarrollo , Músculos Pectorales/metabolismo , Reacción en Cadena de la Polimerasa , Reproducibilidad de los Resultados , Espectrometría de Fluorescencia , Spheniscidae/genética , Spheniscidae/crecimiento & desarrollo
10.
BMC Genomics ; 10: 335, 2009 Jul 24.
Artículo en Inglés | MEDLINE | ID: mdl-19630943

RESUMEN

BACKGROUND: The Intergenic Breakage Model, which is the current model of structural genome evolution, considers that evolutionary rearrangement breakages happen with a uniform propensity along the genome but are selected against in genes, their regulatory regions and in-between. However, a growing body of evidence shows that there exists regions along mammalian genomes that present a high susceptibility to breakage. We reconsidered this question taking advantage of a recently published methodology for the precise detection of rearrangement breakpoints based on pairwise genome comparisons. RESULTS: We applied this methodology between the genome of human and those of five sequenced eutherian mammals which allowed us to delineate evolutionary breakpoint regions along the human genome with a finer resolution (median size 26.6 kb) than obtained before. We investigated the distribution of these breakpoints with respect to genome organisation into domains of different activity. In agreement with the Intergenic Breakage Model, we observed that breakpoints are under-represented in genes. Surprisingly however, the density of breakpoints in small intergenes (1 per Mb) appears significantly higher than in gene deserts (0.1 per Mb).More generally, we found a heterogeneous distribution of breakpoints that follows the organisation of the genome into isochores (breakpoints are more frequent in GC-rich regions). We then discuss the hypothesis that regions with an enhanced susceptibility to breakage correspond to regions of high transcriptional activity and replication initiation. CONCLUSION: We propose a model to describe the heterogeneous distribution of evolutionary breakpoints along human chromosomes that combines natural selection and a mutational bias linked to local open chromatin state.


Asunto(s)
Rotura Cromosómica , Evolución Molecular , Genómica/métodos , Modelos Genéticos , Animales , Composición de Base , Mapeo Cromosómico/métodos , Cromosomas de los Mamíferos/genética , Genoma Humano , Humanos , Isocoras , Mamíferos/genética
11.
BMC Bioinformatics ; 9: 286, 2008 Jun 18.
Artículo en Inglés | MEDLINE | ID: mdl-18564416

RESUMEN

BACKGROUND: Genomes undergo large structural changes that alter their organisation. The chromosomal regions affected by these rearrangements are called breakpoints, while those which have not been rearranged are called synteny blocks. We developed a method to precisely delimit rearrangement breakpoints on a genome by comparison with the genome of a related species. Contrary to current methods which search for synteny blocks and simply return what remains in the genome as breakpoints, we propose to go further and to investigate the breakpoints themselves in order to refine them. RESULTS: Given some reliable and non overlapping synteny blocks, the core of the method consists in refining the regions that are not contained in them. By aligning each breakpoint sequence against its specific orthologous sequences in the other species, we can look for weak similarities inside the breakpoint, thus extending the synteny blocks and narrowing the breakpoints. The identification of the narrowed breakpoints relies on a segmentation algorithm and is statistically assessed. Since this method requires as input synteny blocks with some properties which, though they appear natural, are not verified by current methods for detecting such blocks, we further give a formal definition and provide an algorithm to compute them. The whole method is applied to delimit breakpoints on the human genome when compared to the mouse and dog genomes. Among the 355 human-mouse and 240 human-dog breakpoints, 168 and 146 respectively span less than 50 Kb. We compared the resulting breakpoints with some publicly available ones and show that we achieve a better resolution. Furthermore, we suggest that breakpoints are rarely reduced to a point, and instead consist in often large regions that can be distinguished from the sequences around in terms of segmental duplications, similarity with related species, and transposable elements. CONCLUSION: Our method leads to smaller breakpoints than already published ones and allows for a better description of their internal structure. In the majority of cases, our refined regions of breakpoint exhibit specific biological properties (no similarity, presence of segmental duplications and of transposable elements). We hope that this new result may provide some insight into the mechanism and evolutionary properties of chromosomal rearrangements.


Asunto(s)
Rotura Cromosómica , Cromosomas de los Mamíferos/genética , Animales , Perros , Genoma , Genoma Humano , Humanos , Ratones , Homología de Secuencia de Ácido Nucleico , Sintenía
12.
BMC Genomics ; 9: 632, 2008 Dec 24.
Artículo en Inglés | MEDLINE | ID: mdl-19108743

RESUMEN

BACKGROUND: One of the most striking features of mammalian and birds chromosomes is the variation in the guanine-cytosine (GC) content that occurs over scales of hundreds of kilobases to megabases; this is known as the "isochore" structure. Among other vertebrates the presence of isochores depends upon the taxon; isochore are clearly present in Crocodiles and turtles but fish genome seems very homogeneous on GC content. This has suggested a unique isochore origin after the divergence between Sarcopterygii and Actinopterygii, but before that between Sauropsida and mammals. However during more than 30 years of analysis, isochore characteristics have been studied and many important biological properties have been associated with the isochore structure of human genomes. For instance, the genes are more compact and their density is highest in GC rich isochores. RESULTS: This paper shows in teleost fish genomes the existence of "GC segmentation" sharing some of the characteristics of isochores although teleost fish genomes presenting a particular homogeneity in CG content. The entire genomes of T nigroviridis and D rerio are now available, and this has made it possible to check whether a mosaic structure associated with isochore properties can be found in these fishes. In this study, hidden Markov models were trained on fish genes (T nigroviridis and D rerio) which were classified by using the isochore class of their human orthologous. A clear segmentation of these genomes was detected. CONCLUSION: The GC content is an excellent indicator of isochores in heterogeneous genomes as mammals. The segmentation we obtained were well correlated with GC content and other properties associated to GC content such as gene density, the number of exons per gene and the length of introns. Therefore, the GC content is the main property that allows the detection of isochore but more biological properties have to be taken into account. This method allows detecting isochores in homogeneous genomes.


Asunto(s)
Composición de Base , Peces/genética , Isocoras , Animales , Teorema de Bayes , Pollos , Mapeo Cromosómico , Genoma , Humanos , Cadenas de Markov , Modelos Genéticos , Análisis de Secuencia de ADN , Pez Cebra/genética
13.
Biochimie ; 90(4): 563-9, 2008 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-17988782

RESUMEN

Single nucleotide polymorphisms (SNPs), which are the most abundant form of genetic variations in numerous organisms, have emerged as important tools for the study of complex genetic traits and deciphering of genome evolution. High-throughput genome sequencing projects worldwide provide an unprecedented opportunity for whole-genome SNP analysis in a variety of species. To facilitate SNP discovery in vertebrates, we have developed a web-based, user-friendly, and fully automated application, DigiPINS, for genome-wide identification of exonic SNPs from EST data. Currently, the database can be used to the mining of exonic SNPs in six complete genomes (Homo sapiens, Mus musculus, Rattus norvegicus, Canis familiaris, Gallus gallus and Danio rerio). In addition to providing information on sequence conservation, DigiPINS allows compilation of comprehensive sets of polymorphisms within cancer candidate genes or identification of novel cancer markers, making it potentially useful for cancer association studies. The DigiPINS server is available via the internet at http://pbil.univ-lyon1.fr/gem/DigiPINS/query_DigiPINS.php.


Asunto(s)
Bases de Datos Genéticas , Neoplasias/genética , Polimorfismo de Nucleótido Simple , Programas Informáticos , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Biomarcadores , Etiquetas de Secuencia Expresada , Genoma , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Datos de Secuencia Molecular , Análisis de Secuencia de ADN , Interfaz Usuario-Computador
14.
BMC Bioinformatics ; 8: 90, 2007 Mar 13.
Artículo en Inglés | MEDLINE | ID: mdl-17355634

RESUMEN

BACKGROUND: With the advance of microarray technology, several methods for gene classification and prognosis have been already designed. However, under various denominations, some of these methods have similar approaches. This study evaluates the influence of gene expression variance structure on the performance of methods that describe the relationship between gene expression levels and a given phenotype through projection of data onto discriminant axes. RESULTS: We compared Between-Group Analysis and Discriminant Analysis (with prior dimension reduction through Partial Least Squares or Principal Components Analysis). A geometric approach showed that these two methods are strongly related, but differ in the way they handle data structure. Yet, data structure helps understanding the predictive efficiency of these methods. Three main structure situations may be identified. When the clusters of points are clearly split, both methods perform equally well. When the clusters superpose, both methods fail to give interesting predictions. In intermediate situations, the configuration of the clusters of points has to be handled by the projection to improve prediction. For this, we recommend Discriminant Analysis. Besides, an innovative way of simulation generated the three main structures by modelling different partitions of the whole variance into within-group and between-group variances. These simulated datasets were used in complement to some well-known public datasets to investigate the methods behaviour in a large diversity of structure situations. To examine the structure of a dataset before analysis and preselect an a priori appropriate method for its analysis, we proposed a two-graph preliminary visualization tool: plotting patients on the Between-Group Analysis discriminant axis (x-axis) and on the first and the second within-group Principal Components Analysis component (y-axis), respectively. CONCLUSION: Discriminant Analysis outperformed Between-Group Analysis because it allows for the dataset structure. An a priori knowledge of that structure may guide the choice of the analysis method. Simulated datasets with known properties are valuable to assess and compare the performance of analysis methods, then implementation on real datasets checks and validates the results. Thus, we warn against the use of unchallenging datasets for method comparison, such as the Golub dataset, because their structure is such that any method would be efficient.


Asunto(s)
Procesamiento Automatizado de Datos/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Simulación por Computador , Interpretación Estadística de Datos , Humanos , Leucemia/genética , Linfoma de Células B Grandes Difuso/genética , Masculino , Modelos Biológicos , Fenotipo , Neoplasias de la Próstata/genética
15.
BMC Genomics ; 8: 2, 2007 Jan 03.
Artículo en Inglés | MEDLINE | ID: mdl-17201911

RESUMEN

BACKGROUND: A promising application of the huge amounts of genetic data currently available lies in developing a better understanding of complex diseases, such as cancer. Analysis of publicly available databases can help identify potential candidates for genes or mutations specifically related to the cancer phenotype. In spite of their huge potential to affect gene function, no systematic attention has been paid so far to the changes that occur in untranslated regions of mRNA. RESULTS: In this study, we used Expressed Sequence Tag (EST) databases as a source for cancer-related sequence polymorphism discovery at the whole-genome level. Using a novel computational procedure, we focused on the identification of untranslated region (UTR)-localized non-coding Single Nucleotide Polymorphisms (UTR-SNPs) significantly associated with the tumoral state. To explore possible relationships between genetic mutation and phenotypic variation, bioinformatic tools were used to predict the potential impact of cancer-associated UTR-SNPs on mRNA secondary structure and UTR regulatory elements. We provide a comprehensive and unbiased description of cancer-associated UTR-SNPs that may be useful to define genotypic markers or to propose polymorphisms that can act to alter gene expression levels. Our results suggest that a fraction of cancer-associated UTR-SNPs may have functional consequences on mRNA stability and/or expression. CONCLUSION: We have undertaken a comprehensive effort to identify cancer-associated polymorphisms in untranslated regions of mRNA and to characterize putative functional UTR-SNPs. Alteration of translational control can change the expression of genes in tumor cells, causing an increase or decrease in the concentration of specific proteins. Through the description of testable candidates and the experimental validation of a number of UTR-SNPs discovered on the secreted protein acidic and rich in cysteine (SPARC) gene, this report illustrates the utility of a cross-talk between in silico transcriptomics and cancer genetics.


Asunto(s)
Genoma Humano/genética , Neoplasias/genética , Polimorfismo de Nucleótido Simple , Regiones no Traducidas/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Etiquetas de Secuencia Expresada , Humanos , Leucemia Mieloide Aguda/genética , Conformación de Ácido Nucleico , Osteonectina/genética , ARN Mensajero/química , ARN Mensajero/genética , Regiones no Traducidas/química
16.
Oncogene ; 24(40): 6133-42, 2005 Sep 08.
Artículo en Inglés | MEDLINE | ID: mdl-15897869

RESUMEN

Last decade has led to the accumulation of large amounts of data on cancer genetics, opening an unprecedented access to the mapping of cancer genes in the human genome. Single-nucleotide polymorphisms (SNPs), the most common form of DNA variation in humans, emerge as an invaluable tool for cancer association studies. These genotypic markers can be used to assay how alleles of candidate genes correlate with the malignant phenotype, and may provide new clues into the genetic modifications that characterize cancer onset. In this cancer-oriented study, we detail an SNP mining strategy based on the analysis of expressed sequence tags among publicly available databases. Our whole-genome approach provides a comprehensive and unbiased description of nonsynonymous SNPs (nsSNPs) in tumoral versus normal tissues. To gain further insights into the possible relationships between genetic variation and altered phenotype, locations of a subset of nsSNPs were mapped onto protein domains known to be critical for protein function. Computational methods were also used to predict the potential impact of these cancer-associated nsSNPs on protein structure and function. We illustrate our approach through the detailed biochemical and structural characterization of a previously unknown cancer-associated mutation (G79C) affecting the 8 kDa dynein light chain (DNCL1).


Asunto(s)
Dineínas/genética , Etiquetas de Secuencia Expresada , Neoplasias/genética , Polimorfismo de Nucleótido Simple , Biología Computacional , Dineínas Citoplasmáticas , Análisis Mutacional de ADN , Marcadores Genéticos , Genoma Humano , Humanos , Fenotipo
17.
BMC Genomics ; 7: 94, 2006 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-16640784

RESUMEN

BACKGROUND: Owing to the explosion of information generated by human genomics, analysis of publicly available databases can help identify potential candidate genes relevant to the cancerous phenotype. The aim of this study was to scan for such genes by whole-genome in silico subtraction using Expressed Sequence Tag (EST) data. METHODS: Genes differentially expressed in normal versus tumor tissues were identified using a computer-based differential display strategy. Bcl-xL, an anti-apoptotic member of the Bcl-2 family, was selected for confirmation by western blot analysis. RESULTS: Our genome-wide expression analysis identified a set of genes whose differential expression may be attributed to the genetic alterations associated with tumor formation and malignant growth. We propose complete lists of genes that may serve as targets for projects seeking novel candidates for cancer diagnosis and therapy. Our validation result showed increased protein levels of Bcl-xL in two different liver cancer specimens compared to normal liver. Notably, our EST-based data mining procedure indicated that most of the changes in gene expression observed in cancer cells corresponded to gene inactivation patterns. Chromosomes and chromosomal regions most frequently associated with aberrant expression changes in cancer libraries were also determined. CONCLUSION: Through the description of several candidates (including genes encoding extracellular matrix and ribosomal components, cytoskeletal proteins, apoptotic regulators, and novel tissue-specific biomarkers), our study illustrates the utility of in silico transcriptomics to identify tumor cell signatures, tumor-related genes and chromosomal regions frequently associated with aberrant expression in cancer.


Asunto(s)
Biología Computacional/métodos , Etiquetas de Secuencia Expresada , Regulación Neoplásica de la Expresión Génica , Neoplasias/metabolismo , Algoritmos , Western Blotting , Mapeo Cromosómico , Cromosomas/ultraestructura , Interpretación Estadística de Datos , Bases de Datos Genéticas , Regulación hacia Abajo , Perfilación de la Expresión Génica/métodos , Biblioteca de Genes , Genoma Humano , Humanos , Queratinas/metabolismo , Neoplasias Hepáticas/metabolismo , Modelos Genéticos , Neoplasias/embriología , Neoplasias/genética , Regulación hacia Arriba , Proteína bcl-X/biosíntesis , Proteína bcl-X/genética
18.
Gene ; 385: 41-9, 2006 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-17020791

RESUMEN

Mammalian genomes are organised into a mosaic of regions (in general more than 300 kb in length), with differing, relatively homogeneous G+C contents. The G+C content is the basic characteristic of isochores, but they have also been associated with many other biological properties. For instance, the genes are more compact and their density is highest in G+C rich isochores. Various ways of locating isochores in the human genome have been developed, but such methods use only the base composition of the DNA sequences. The present paper proposes a new method, based on a hidden Markov model, which takes into account several of the biological properties associated with the isochore structure of a genome. This method leads to good segmentation of the human genome into isochores, and also permits a new analysis of the known heterogeneity of G+C rich isochores: most (60%) of the G+C poor genes embedded in G+C rich isochores have UTR sequences characteristic of G+C rich genes. This genomic feature is discussed in the context of both evolution and genome function.


Asunto(s)
Isocoras/genética , Modelos Genéticos , Regiones no Traducidas 5' , Algoritmos , Mapeo Cromosómico , Secuencia Rica en GC , Genoma Humano , Humanos , Cadenas de Markov
19.
Comput Biol Med ; 43(4): 334-41, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23375235

RESUMEN

In this study, we discuss and apply a novel and efficient algorithm for learning a local Bayesian network model in the vicinity of the ZNF217 oncogene from breast cancer microarray data without having to decide in advance which genes have to be included in the learning process. ZNF217 is a candidate oncogene located at 20q13, a chromosomal region frequently amplified in breast and ovarian cancer, and correlated with shorter patient survival in these cancers. To properly address the difficulties in managing complex gene interactions given our limited sample, statistical significance of edge strengths was evaluated using bootstrapping and the less reliable edges were pruned to increase the network robustness. We found that 13 out of the 35 genes associated with deregulated ZNF217 expression in breast tumours have been previously associated with survival and/or prognosis in cancers. Identifying genes involved in lipid metabolism opens new fields of investigation to decipher the molecular mechanisms driven by the ZNF217 oncogene. Moreover, nine of the 13 genes have already been identified as putative ZNF217 targets by independent biological studies. We therefore suggest that the algorithms for inferring local BNs are valuable data mining tools for unraveling complex mechanisms of biological pathways from expression data. The source code is available at http://www710.univ-lyon1.fr/∼aaussem/Software.html.


Asunto(s)
Neoplasias de la Mama/genética , Regulación Neoplásica de la Expresión Génica , Transactivadores/metabolismo , Algoritmos , Inteligencia Artificial , Automatización , Teorema de Bayes , Neoplasias de la Mama/metabolismo , Simulación por Computador , Femenino , Perfilación de la Expresión Génica , Humanos , Metabolismo de los Lípidos , Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Oncogenes , Neoplasias Ováricas/genética , Pronóstico , Transactivadores/genética
20.
Genome Biol Evol ; 4(3): 412-22, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22417915

RESUMEN

Meiotic recombination is an important evolutionary force shaping the nucleotide landscape of genomes. For most vertebrates, the frequency of recombination varies slightly or considerably between the sexes (heterochiasmy). In humans, male, rather than female, recombination rate has been found to be more highly correlated with the guanine and cytosine (GC) content across the genome. In the present study, we review the results in human and extend the examination of the evolutionary impact of heterochiasmy beyond primates to include four additional eutherian mammals (mouse, dog, pig, and sheep), a metatherian mammal (opossum), and a bird (chicken). Specifically, we compared sex-specific recombination rates (RRs) with nucleotide substitution patterns evaluated in transposable elements. Our results, based on a comparative approach, reveal a great diversity in the relationship between heterochiasmy and nucleotide composition. We find that the stronger male impact on this relationship is a conserved feature of human, mouse, dog, and sheep. In contrast, variation in genomic GC content in pig and opossum is more strongly correlated with female, rather than male, RR. Moreover, we show that the sex-differential impact of recombination is mainly driven by the chromosomal localization of recombination events. Independent of sex, the higher the RR in a genomic region and the longer this recombination activity is conserved in time, the stronger the bias in nucleotide substitution pattern, through such mechanisms as biased gene conversion. Over time, this bias will increase the local GC content of the region.


Asunto(s)
Meiosis/genética , Recombinación Genética/genética , Animales , Composición de Base/genética , Perros , Evolución Molecular , Femenino , Conversión Génica/genética , Humanos , Masculino , Ratones , Ovinos , Porcinos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA