Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
BMC Cancer ; 13: 387, 2013 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-23947815

RESUMEN

BACKGROUND: Paediatric low-grade gliomas (LGGs) encompass a heterogeneous set of tumours of different histologies, site of lesion, age and gender distribution, growth potential, morphological features, tendency to progression and clinical course. Among LGGs, Pilocytic astrocytomas (PAs) are the most common central nervous system (CNS) tumours in children. They are typically well-circumscribed, classified as grade I by the World Health Organization (WHO), but recurrence or progressive disease occurs in about 10-20% of cases. Despite radiological and neuropathological features deemed as classic are acknowledged, PA may present a bewildering variety of microscopic features. Indeed, tumours containing both neoplastic ganglion and astrocytic cells occur at a lower frequency. METHODS: Gene expression profiling on 40 primary LGGs including PAs and mixed glial-neuronal tumours comprising gangliogliomas (GG) and desmoplastic infantile gangliogliomas (DIG) using Affymetrix array platform was performed. A biologically validated machine learning workflow for the identification of microarray-based gene signatures was devised. The method is based on a sparsity inducing regularization algorithm l1l2 that selects relevant variables and takes into account their correlation. The most significant genetic signatures emerging from gene-chip analysis were confirmed and validated by qPCR. RESULTS: We identified an expression signature composed by a biologically validated list of 15 genes, able to distinguish infratentorial from supratentorial LGGs. In addition, a specific molecular fingerprinting distinguishes the supratentorial PAs from those originating in the posterior fossa. Lastly, within supratentorial tumours, we also identified a gene expression pattern composed by neurogenesis, cell motility and cell growth genes which dichotomize mixed glial-neuronal tumours versus PAs. Our results reinforce previous observations about aberrant activation of the mitogen-activated protein kinase (MAPK) pathway in LGGs, but still point to an active involvement of TGF-beta signaling pathway in the PA development and pick out some hitherto unreported genes worthy of further investigation for the mixed glial-neuronal tumours. CONCLUSIONS: The identification of a brain region-specific gene signature suggests that LGGs, with similar pathological features but located at different sites, may be distinguishable on the basis of cancer genetics. Molecular fingerprinting seems to be able to better sub-classify such morphologically heterogeneous tumours and it is remarkable that mixed glial-neuronal tumours are strikingly separated from PAs.


Asunto(s)
Neoplasias Encefálicas/genética , Neoplasias Encefálicas/patología , Glioma/genética , Glioma/patología , Transcriptoma , Astrocitoma/genética , Astrocitoma/patología , Niño , Preescolar , Análisis por Conglomerados , Femenino , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Humanos , Lactante , Neoplasias Infratentoriales/genética , Neoplasias Infratentoriales/metabolismo , Masculino , Clasificación del Tumor , Reproducibilidad de los Resultados , Neoplasias Supratentoriales/genética , Neoplasias Supratentoriales/metabolismo
2.
BMC Bioinformatics ; 11: 33, 2010 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-20078885

RESUMEN

BACKGROUND: With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. RESULTS: We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies. CONCLUSIONS: Machine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.


Asunto(s)
Inteligencia Artificial , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Bases de Datos de Ácidos Nucleicos , Genoma Bacteriano , Genoma Fúngico
3.
Bioinformatics ; 23(19): 2528-35, 2007 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-17698491

RESUMEN

MOTIVATION: Mass spectrometry (MS) is increasingly being used for biomedical research. The typical analysis of MS data consists of several steps. Feature extraction is a crucial step since subsequent analyses are performed only on the detected features. Current methodologies applied to low-resolution MS, in which features are peaks or wavelet functions, are parameter-sensitive and inaccurate in the sense that peaks and wavelet functions do not directly correspond to the underlying molecules under observation. In high-resolution MS, the model-based approach is more appealing as it can provide a better representation of the MS signals by incorporating information about peak shapes and isotopic distributions. Current model-based techniques are computationally expensive; various algorithms have been proposed to improve the computational efficiency of this paradigm. However, these methods cannot deal well with overlapping features, especially when they are merged to create one broad peak. In addition, no method has been proven to perform well across different MS platforms. RESULTS: We suggest a new model-based approach to feature extraction in which spectra are decomposed into a mixture of distributions derived from peptide models. By incorporating kernel-based smoothing and perceptual similarity for matching distributions, our statistical framework improves existing methodologies in terms of computational efficiency and the accuracy of the results. Our model is parameterized by physical properties and is therefore applicable to different MS instruments and settings. We validate our approach on simulated data, and show that the performance is higher than commonly used tools on real high- and low-resolution MS, and MS/MS data sets.


Asunto(s)
Algoritmos , Inteligencia Artificial , Modelos Químicos , Reconocimiento de Normas Patrones Automatizadas/métodos , Mapeo Peptídico/métodos , Proteoma/química , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Simulación por Computador , Datos de Secuencia Molecular
4.
J Comput Biol ; 13(10): 1673-84, 2006 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-17238838

RESUMEN

Although there has been great success in identifying disease genes for simple, monogenic Mendelian traits, deciphering the genetic mechanisms involved in complex diseases remains challenging. One major approach is to identify configurations of interacting factors such as single nucleotide polymorphisms (SNPs) that confer susceptibility to disease. Traditional methods, such as the multiple dimensional reduction method and the combinatorial partitioning method, provide good tools to decipher such interactions amid a disease population with a single genetic cause. However, these traditional methods have not managed to resolve the issue of genetic heterogeneity, which is believed to be a very common phenomenon in complex diseases. There is rarely prior knowledge of the genetic heterogeneity of a disease, and traditional methods based on estimation over the entire population are unlikely to succeed in the presence of heterogeneity. We present a novel Boosted Generative Modeling (BGM) approach for structure-model the interactions leading to diseases in the context of genetic heterogeneity. Our BGM method bridges the ensemble and generative modeling approaches to genetic association studies under a case-control design. Generative modeling is employed to model the interaction network configuration and the causal relationships, while boosting is used to address the genetic heterogeneity problem. We perform our method on simulation data of complex diseases. The results indicate that our method is capable of modeling the structure of interaction networks among disease-susceptible loci and of addressing genetic heterogeneity issues where the traditional methods, such as multiple dimensional reduction method, fail to apply. Our BGM method provides an exploratory tool that identifies the variables (e.g., disease-susceptible loci) that are likely to correlate and contribute to the disease.


Asunto(s)
Epistasis Genética , Enfermedades Genéticas Congénitas/genética , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Algoritmos , Estudios de Casos y Controles , Simulación por Computador , Predisposición Genética a la Enfermedad
5.
J Comput Biol ; 18(4): 547-57, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21417940

RESUMEN

The characterization of proteins via liquid chromatography-mass spectrometry (LC-MS) and tandem MS is a challenge due to the large dynamic range and the high complexity of the molecules of interest. In LC-MS experiments, the inconsistent variation in the travel time of analytes in the LC column results in nonlinear shifts in the LC retention time (RT). This variability must be corrected to accurately match corresponding peptide features across samples in LC-MS experiments. Standard methods for RT alignment applied to the raw data are computationally expensive, making it impractical to process a large number of samples. More successful algorithms perform the alignment on features that matched across experiments based on pre-specified mass and RT windows. Features that match across multiple experiments are more likely to be true positives and, therefore, will be more suitable to drive the alignment correction. However, depending on the feature matching algorithm, ambiguities can arise when more than one candidate feature match falls within the specified windows which might affect the alignment performance. In addition, some of the feature-based alignment algorithms do not correct for nonlinear RT shifts. We propose a novel feature matching algorithm that incorporates wavelet-based shape information about the features. We tested our algorithm on two different applications of MS. First, we combined the feature matching algorithm with a robust nonparametric kernel-type regression to form a nonlinear feature-based alignment framework for LC-MS experiments. We validated our alignment framework on LC-MS data from complex samples with known spiked-in proteins, demonstrating our ability to correctly identify each of them with higher reproducibility and probability score when comparing with the SuperHirn software. In addition, by using our feature-based alignment framework, we were able to increase the number of matched features and improve the correlation between replicates. Second, we tested our feature matching algorithm on MALDI MS with MS/MS acquisitions. We found that using only features that matched across replicates of tandem mass spectra we could improve the identification of peptides compared with the current state-of-the-art software. Supplementary Material is available online at www.libertonline.com/cmb .


Asunto(s)
Algoritmos , Proteínas/química , Proteómica/métodos , Espectrometría de Masas en Tándem/métodos , Cromatografía Liquida/métodos , Humanos , Péptidos/química , Reproducibilidad de los Resultados , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/métodos
6.
Bioinformatics ; 18 Suppl 1: S294-302, 2002.
Artículo en Inglés | MEDLINE | ID: mdl-12169559

RESUMEN

MOTIVATION: Current genomic sequence assemblers assume that the input data is derived from a single, homogeneous source. However, recent whole-genome shotgun sequencing projects have violated this assumption, resulting in input fragments covering the same region of the genome whose sequences differ due to polymorphic variation in the population. While single-nucleotide polymorphisms (SNPs) do not pose a significant problem to state-of-the-art assembly methods, these methods do not handle insertion/deletion (indel) polymorphisms of more than a few bases. RESULTS: This paper describes an efficient method for detecting sequence discrepencies due to polymorphism that avoids resorting to global use of more costly, less stringent affine sequence alignments. Instead, the algorithm uses graph-based methods to determine the small set of fragments involved in each polymorphism and performs more sophisticated alignments only among fragments in that set. Results from the incorporation of this method into the Celera Assembler are reported for the D. melanogaster, H. sapiens, and M. musculus genomes.


Asunto(s)
Algoritmos , Secuencia de Consenso/genética , Fragmentación del ADN/genética , Perfilación de la Expresión Génica/métodos , Polimorfismo Genético/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Variación Genética , Datos de Secuencia Molecular , Polimorfismo de Longitud del Fragmento de Restricción
7.
Proc Natl Acad Sci U S A ; 101(7): 1916-21, 2004 Feb 17.
Artículo en Inglés | MEDLINE | ID: mdl-14769938

RESUMEN

We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304-1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860-921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.


Asunto(s)
Biología Computacional , Genoma Humano , Proyecto Genoma Humano , Biología Computacional/normas , Mapeo Contig/normas , Humanos , ARN Mensajero/análisis , Programas Informáticos
8.
Science ; 296(5573): 1661-71, 2002 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-12040188

RESUMEN

The high degree of similarity between the mouse and human genomes is demonstrated through analysis of the sequence of mouse chromosome 16 (Mmu 16), which was obtained as part of a whole-genome shotgun assembly of the mouse genome. The mouse genome is about 10% smaller than the human genome, owing to a lower repetitive DNA content. Comparison of the structure and protein-coding potential of Mmu 16 with that of the homologous segments of the human genome identifies regions of conserved synteny with human chromosomes (Hsa) 3, 8, 12, 16, 21, and 22. Gene content and order are highly conserved between Mmu 16 and the syntenic blocks of the human genome. Of the 731 predicted genes on Mmu 16, 509 align with orthologs on the corresponding portions of the human genome, 44 are likely paralogous to these genes, and 164 genes have homologs elsewhere in the human genome; there are 14 genes for which we could find no human counterpart.


Asunto(s)
Cromosomas/genética , Genoma Humano , Genoma , Ratones Endogámicos/genética , Análisis de Secuencia de ADN , Sintenía , Animales , Composición de Base , Cromosomas Humanos/genética , Biología Computacional , Secuencia Conservada , Bases de Datos de Ácidos Nucleicos , Evolución Molecular , Genes , Marcadores Genéticos , Genómica , Humanos , Ratones , Ratones Endogámicos A/genética , Ratones Endogámicos DBA/genética , Datos de Secuencia Molecular , Mapeo Físico de Cromosoma , Proteínas/química , Proteínas/genética , Alineación de Secuencia , Especificidad de la Especie
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA