Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Mol Cancer ; 12(1): 70, 2013 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-23835063

RESUMEN

BACKGROUND: Neuroblastoma (NB) tumours are commonly divided into three cytogenetic subgroups. However, by unsupervised principal components analysis of gene expression profiles we recently identified four distinct subgroups, r1-r4. In the current study we characterized these different subgroups in more detail, with a specific focus on the fourth divergent tumour subgroup (r4). METHODS: Expression microarray data from four international studies corresponding to 148 neuroblastic tumour cases were subject to division into four expression subgroups using a previously described 6-gene signature. Differentially expressed genes between groups were identified using Significance Analysis of Microarray (SAM). Next, gene expression network modelling was performed to map signalling pathways and cellular processes representing each subgroup. Findings were validated at the protein level by immunohistochemistry and immunoblot analyses. RESULTS: We identified several significantly up-regulated genes in the r4 subgroup of which the tyrosine kinase receptor ERBB3 was most prominent (fold change: 132-240). By gene set enrichment analysis (GSEA) the constructed gene network of ERBB3 (n = 38 network partners) was significantly enriched in the r4 subgroup in all four independent data sets. ERBB3 was also positively correlated to the ErbB family members EGFR and ERBB2 in all data sets, and a concurrent overexpression was seen in the r4 subgroup. Further studies of histopathology categories using a fifth data set of 110 neuroblastic tumours, showed a striking similarity between the expression profile of r4 to ganglioneuroblastoma (GNB) and ganglioneuroma (GN) tumours. In contrast, the NB histopathological subtype was dominated by mitotic regulating genes, characterizing unfavourable NB subgroups in particular. The high ErbB3 expression in GN tumour types was verified at the protein level, and showed mainly expression in the mature ganglion cells. CONCLUSIONS: Conclusively, this study demonstrates the importance of performing unsupervised clustering and subtype discovery of data sets prior to analyses to avoid a mixture of tumour subtypes, which may otherwise give distorted results and lead to incorrect conclusions. The current study identifies ERBB3 as a clear-cut marker of a GNB/GN-like expression profile, and we suggest a 7-gene expression signature (including ERBB3) as a complement to histopathology analysis of neuroblastic tumours. Further studies of ErbB3 and other ErbB family members and their role in neuroblastic differentiation and pathogenesis are warranted.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Ganglioneuroblastoma/metabolismo , Ganglioneuroma/metabolismo , Neoplasias del Sistema Nervioso Periférico/metabolismo , Receptor ErbB-3/metabolismo , Biomarcadores de Tumor/genética , Regulación Neoplásica de la Expresión Génica , Ontología de Genes , Redes Reguladoras de Genes , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Receptor ErbB-3/genética , Transcriptoma , Regulación hacia Arriba
2.
Nature ; 450(7169): 560-5, 2007 Nov 22.
Artículo en Inglés | MEDLINE | ID: mdl-18033299

RESUMEN

From the standpoints of both basic research and biotechnology, there is considerable interest in reaching a clearer understanding of the diversity of biological mechanisms employed during lignocellulose degradation. Globally, termites are an extremely successful group of wood-degrading organisms and are therefore important both for their roles in carbon turnover in the environment and as potential sources of biochemical catalysts for efforts aimed at converting wood into biofuels. Only recently have data supported any direct role for the symbiotic bacteria in the gut of the termite in cellulose and xylan hydrolysis. Here we use a metagenomic analysis of the bacterial community resident in the hindgut paunch of a wood-feeding 'higher' Nasutitermes species (which do not contain cellulose-fermenting protozoa) to show the presence of a large, diverse set of bacterial genes for cellulose and xylan hydrolysis. Many of these genes were expressed in vivo or had cellulase activity in vitro, and further analyses implicate spirochete and fibrobacter species in gut lignocellulose degradation. New insights into other important symbiotic functions including H2 metabolism, CO2-reductive acetogenesis and N2 fixation are also provided by this first system-wide gene analysis of a microbial community specialized towards plant lignocellulose degradation. Our results underscore how complex even a 1-microl environment can be.


Asunto(s)
Bacterias/metabolismo , Genoma Bacteriano/genética , Genómica , Intestinos/microbiología , Isópteros/metabolismo , Isópteros/microbiología , Madera/metabolismo , Animales , Bacterias/enzimología , Bacterias/genética , Bacterias/aislamiento & purificación , Fuentes de Energía Bioeléctrica , Carbono/metabolismo , Dominio Catalítico , Celulosa/metabolismo , Costa Rica , Genes Bacterianos/genética , Glicósido Hidrolasas/química , Glicósido Hidrolasas/genética , Glicósido Hidrolasas/metabolismo , Hidrólisis , Lignina/metabolismo , Modelos Biológicos , Datos de Secuencia Molecular , Reacción en Cadena de la Polimerasa , Simbiosis , Madera/química , Xilanos/metabolismo
3.
Bioinformatics ; 26(3): 295-301, 2010 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-20008478

RESUMEN

MOTIVATION: Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , ADN/química , Metagenoma , Metagenómica/métodos , Análisis de Secuencia de ADN/métodos , ADN/genética , Bases de Datos Genéticas
4.
Cancer Cell Int ; 11: 9, 2011 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-21492432

RESUMEN

BACKGROUND: There are currently three postulated genomic subtypes of the childhood tumour neuroblastoma (NB); Type 1, Type 2A, and Type 2B. The most aggressive forms of NB are characterized by amplification of the oncogene MYCN (MNA) and low expression of the favourable marker NTRK1. Recently, mutations or high expression of the familial predisposition gene Anaplastic Lymphoma Kinase (ALK) was associated to unfavourable biology of sporadic NB. Also, various other genes have been linked to NB pathogenesis. RESULTS: The present study explores subgroup discrimination by gene expression profiling using three published microarray studies on NB (47 samples). Four distinct clusters were identified by Principal Components Analysis (PCA) in two separate data sets, which could be verified by an unsupervised hierarchical clustering in a third independent data set (101 NB samples) using a set of 74 discriminative genes. The expression signature of six NB-associated genes ALK, BIRC5, CCND1, MYCN, NTRK1, and PHOX2B, significantly discriminated the four clusters (p < 0.05, one-way ANOVA test). PCA clusters p1, p2, and p3 were found to correspond well to the postulated subtypes 1, 2A, and 2B, respectively. Remarkably, a fourth novel cluster was detected in all three independent data sets. This cluster comprised mainly 11q-deleted MNA-negative tumours with low expression of ALK, BIRC5, and PHOX2B, and was significantly associated with higher tumour stage, poor outcome and poor survival compared to the Type 1-corresponding favourable group (INSS stage 4 and/or dead of disease, p < 0.05, Fisher's exact test). CONCLUSIONS: Based on expression profiling we have identified four molecular subgroups of neuroblastoma, which can be distinguished by a 6-gene signature. The fourth subgroup has not been described elsewhere, and efforts are currently made to further investigate this group's specific characteristics.

5.
Nucleic Acids Res ; 37(7): 2096-104, 2009 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-19223325

RESUMEN

In order to simplify and meaningfully categorize large sets of protein sequence data, it is commonplace to cluster proteins based on the similarity of those sequences. However, it quickly becomes clear that the sequence flexibility allowed a given protein varies significantly among different protein families. The degree to which sequences are conserved not only differs for each protein family, but also is affected by the phylogenetic divergence of the source organisms. Clustering techniques that use similarity thresholds for protein families do not always allow for these variations and thus cannot be confidently used for applications such as automated annotation and phylogenetic profiling. In this work, we applied a spectral bipartitioning technique to all proteins from 53 archaeal genomes. Comparisons between different taxonomic levels allowed us to study the effects of phylogenetic distances on cluster structure. Likewise, by associating functional annotations and phenotypic metadata with each protein, we could compare our protein similarity clusters with both protein function and associated phenotype. Our clusters can be analyzed graphically and interactively online.


Asunto(s)
Algoritmos , Proteínas Arqueales/clasificación , Proteínas Arqueales/química , Proteínas Arqueales/genética , Análisis por Conglomerados , Fenotipo , Filogenia , Análisis de Secuencia de Proteína , Programas Informáticos
6.
Syst Rev ; 10(1): 28, 2021 01 16.
Artículo en Inglés | MEDLINE | ID: mdl-33453724

RESUMEN

BACKGROUND: Sepsis is a life-threatening organ dysfunction caused by a dysregulated host response to infection. To decrease the high case fatality rates and morbidity for sepsis and septic shock, there is a need to increase the accuracy of early detection of suspected sepsis in prehospital and emergency department settings. This may be achieved by developing risk prediction decision support systems based on artificial intelligence. METHODS: The overall aim of this scoping review is to summarize the literature on existing methods for early detection of sepsis using artificial intelligence. The review will be performed using the framework formulated by Arksey and O'Malley and further developed by Levac and colleagues. To identify primary studies and reviews that are suitable to answer our research questions, a comprehensive literature collection will be compiled by searching several sources. Constrictions regarding time and language will have to be implemented. Therefore, only studies published between 1 January 1990 and 31 December 2020 will be taken into consideration, and foreign language publications will not be considered, i.e., only papers with full text in English will be included. Databases/web search engines that will be used are PubMed, Web of Science Platform, Scopus, IEEE Xplore, Google Scholar, Cochrane Library, and ACM Digital Library. Furthermore, clinical studies that have completed patient recruitment and reported results found in the database ClinicalTrials.gov will be considered. The term artificial intelligence is viewed broadly, and a wide range of machine learning and mathematical models suitable as base for decision support will be evaluated. Two members of the team will test the framework on a sample of included studies to ensure that the coding framework is suitable and can be consistently applied. Analysis of collected data will provide a descriptive summary and thematic analysis. The reported results will convey knowledge about the state of current research and innovation for using artificial intelligence to detect sepsis in early phases of the medical care chain. ETHICS AND DISSEMINATION: The methodology used here is based on the use of publicly available information and does not need ethical approval. It aims at aiding further research towards digital solutions for disease detection and health innovation. Results will be extracted into a review report for submission to a peer-reviewed scientific journal. Results will be shared with relevant local and national authorities and disseminated in additional appropriate formats such as conferences, lectures, and press releases.


Asunto(s)
Inteligencia Artificial , Choque Séptico , Humanos , Grupos de Población , Publicaciones , Proyectos de Investigación , Literatura de Revisión como Asunto
7.
Bioinformatics ; 25(20): 2737-8, 2009 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-19696045

RESUMEN

UNLABELLED: Microorganisms are ubiquitous in nature and constitute intrinsic parts of almost every ecosystem. A culture-independent and powerful way to study microbial communities is metagenomics. In such studies, functional analysis is performed on fragmented genetic material from multiple species in the community. The recent advances in high-throughput sequencing have greatly increased the amount of data in metagenomic projects. At present, there is an urgent need for efficient statistical tools to analyse these data. We have created ShotgunFunctionalizeR, an R-package for functional comparison of metagenomes. The package contains tools for importing, annotating and visualizing metagenomic data produced by shotgun high-throughput sequencing. ShotgunFunctionalizeR contains several statistical procedures for assessing functional differences between samples, both for individual genes and for entire pathways. In addition to standard and previously published methods, we have developed and implemented a novel approach based on a Poisson model. This procedure is highly flexible and thus applicable to a wide range of different experimental designs. We demonstrate the potential of ShotgunFunctionalizeR by performing a regression analysis on metagenomes sampled at multiple depths in the Pacific Ocean. AVAILABILITY: http://shotgun.zool.gu.se


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Metagenoma , Metagenómica/métodos , Programas Informáticos , Animales , Bases de Datos Genéticas , Humanos
8.
Stat Appl Genet Mol Biol ; 8: Article 19, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19341353

RESUMEN

Clumping of gene properties like expression or mutant phenotypes along chromosomes is commonly detected using completely random null-models where their location is equally likely across the chromosomes. Interpretation of statistical tests based on these assumptions may be misleading if dependencies exist that are unequal between chromosomes or in different chromosomal parts. One such regional dependency is the telomeric effect, observed in several studies of Saccharomyces cerevisiae, under which e.g. essential genes are less likely to reside near the chromosomal ends. In this study we demonstrate that standard randomisation test procedures are of limited applicability in the presence of telomeric effects. Several extensions of such standard tests are here suggested for handling clumping simultaneously with regional differences in essentiality frequencies in sub-telomeric and central gene positions. Furthermore, a general non-homogeneous discrete Markov approach for combining parametrically modelled position dependent probabilities of a dichotomous property with a simple single parameter clumping is suggested. This Markov model is adapted to the observed telomeric effects and then simulations are used to demonstrate properties of the suggested modified randomisation tests. The model is also applied as a direct alternative tool for statistical analysis of the S. cerevisiae genome for clumping of phenotypes.


Asunto(s)
Mapeo Cromosómico , Modelos Genéticos , Saccharomyces cerevisiae/metabolismo , Simulación por Computador , Regulación Fúngica de la Expresión Génica , Genes Fúngicos , Genoma , Cadenas de Markov , Modelos Biológicos , Modelos Estadísticos , Fenotipo , Probabilidad , Distribución Aleatoria , Telómero/ultraestructura
9.
Nucleic Acids Res ; 36(Database issue): D534-8, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17932063

RESUMEN

IMG/M is a data management and analysis system for microbial community genomes (metagenomes) hosted at the Department of Energy's (DOE) Joint Genome Institute (JGI). IMG/M consists of metagenome data integrated with isolate microbial genomes from the Integrated Microbial Genomes (IMG) system. IMG/M provides IMG's comparative data analysis tools extended to handle metagenome data, together with metagenome-specific analysis tools. IMG/M is available at http://img.jgi.doe.gov/m.


Asunto(s)
Bases de Datos Genéticas , Microbiología Ambiental , Genoma Arqueal , Genoma Bacteriano , Sistemas de Administración de Bases de Datos , Genómica , Internet , Programas Informáticos
10.
Bioinformatics ; 24(11): 1332-8, 2008 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-18381402

RESUMEN

MOTIVATION: The evolutionary distance inferred from gene-order comparisons of related bacteria is dependent on the model. Therefore, it is highly important to establish reliable assumptions before inferring its magnitude. RESULTS: We investigate the patterns of dotplots between species of bacteria with the purpose of model selection in gene-order problems. We find several categories of data which can be explained by carefully weighing the contributions of reversals, transpositions, symmetrical reversals, single gene transpositions and single gene reversals. We also derive method of moments distance estimates for some previously uncomputed cases, such as symmetrical reversals, single gene reversals and their combinations, as well as the single gene transpositions edit distance.


Asunto(s)
Evolución Biológica , Mapeo Cromosómico/métodos , ADN Bacteriano/genética , Evolución Molecular , Genoma Bacteriano/genética , Modelos Genéticos , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Simulación por Computador , Desequilibrio de Ligamiento/genética , Datos de Secuencia Molecular
11.
Bioinformatics ; 24(16): i7-13, 2008 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-18689842

RESUMEN

MOTIVATION: A typical metagenome dataset generated using a 454 pyrosequencing platform consists of short reads sampled from the collective genome of a microbial community. The amount of sequence in such datasets is usually insufficient for assembly, and traditional gene prediction cannot be applied to unassembled short reads. As a result, analysis of such datasets usually involves comparisons in terms of relative abundances of various protein families. The latter requires assignment of individual reads to protein families, which is hindered by the fact that short reads contain only a fragment, usually small, of a protein. RESULTS: We have considered the assignment of pyrosequencing reads to protein families directly using RPS-BLAST against COG and Pfam databases and indirectly via proxygenes that are identified using BLASTx searches against protein sequence databases. Using simulated metagenome datasets as benchmarks, we show that the proxygene method is more accurate than the direct assignment. We introduce a clustering method which significantly reduces the size of a metagenome dataset while maintaining a faithful representation of its functional and taxonomic content.


Asunto(s)
Proteínas Bacterianas/genética , Mapeo Cromosómico/métodos , Sistemas de Lectura Abierta/genética , Proteoma/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Análisis por Conglomerados , Datos de Secuencia Molecular
12.
BMC Bioinformatics ; 8: 295, 2007 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-17686169

RESUMEN

BACKGROUND: The translational efficiency of an mRNA can be modulated by upstream open reading frames (uORFs) present in certain genes. A uORF can attenuate translation of the main ORF by interfering with translational reinitiation at the main start codon. uORFs also occur by chance in the genome, in which case they do not have a regulatory role. Since the sequence determinants for functional uORFs are not understood, it is difficult to discriminate functional from spurious uORFs by sequence analysis. RESULTS: We have used comparative genomics to identify novel uORFs in yeast with a high likelihood of having a translational regulatory role. We examined uORFs, previously shown to play a role in regulation of translation in Saccharomyces cerevisiae, for evolutionary conservation within seven Saccharomyces species. Inspection of the set of conserved uORFs yielded the following three characteristics useful for discrimination of functional from spurious uORFs: a length between 4 and 6 codons, a distance from the start of the main ORF between 50 and 150 nucleotides, and finally a lack of overlap with, and clear separation from, neighbouring uORFs. These derived rules are inherently associated with uORFs with properties similar to the GCN4 locus, and may not detect most uORFs of other types. uORFs with high scores based on these rules showed a much higher evolutionary conservation than randomly selected uORFs. In a genome-wide scan in S. cerevisiae, we found 34 conserved uORFs from 32 genes that we predict to be functional; subsequent analysis showed the majority of these to be located within transcripts. A total of 252 genes were found containing conserved uORFs with properties indicative of a functional role; all but 7 are novel. Functional content analysis of this set identified an overrepresentation of genes involved in transcriptional control and development. CONCLUSION: Evolutionary conservation of uORFs in yeasts can be traced up to 100 million years of separation. The conserved uORFs have certain characteristics with respect to length, distance from each other and from the main start codon, and folding energy of the sequence. These newly found characteristics can be used to facilitate detection of other conserved uORFs.


Asunto(s)
Mapeo Cromosómico/métodos , Evolución Molecular , Genoma Fúngico/genética , Sistemas de Lectura Abierta/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Saccharomyces cerevisiae/genética , Análisis de Secuencia de ADN/métodos , Algoritmos , Secuencia de Bases , Secuencia Conservada/genética , Datos de Secuencia Molecular , Biosíntesis de Proteínas/genética
13.
BMC Bioinformatics ; 8: 402, 2007 Oct 18.
Artículo en Inglés | MEDLINE | ID: mdl-17949484

RESUMEN

BACKGROUND: Accurate taxonomy is best maintained if species are arranged as hierarchical groups in phylogenetic trees. This is especially important as trees grow larger as a consequence of a rapidly expanding sequence database. Hierarchical group names are typically manually assigned in trees, an approach that becomes unfeasible for very large topologies. RESULTS: We have developed an automated iterative procedure for delineating stable (monophyletic) hierarchical groups to large (or small) trees and naming those groups according to a set of sequentially applied rules. In addition, we have created an associated ungrouping tool for removing existing groups that do not meet user-defined criteria (such as monophyly). The procedure is implemented in a program called GRUNT (GRouping, Ungrouping, Naming Tool) and has been applied to the current release of the Greengenes (Hugenholtz) 16S rRNA gene taxonomy comprising more than 130,000 taxa. CONCLUSION: GRUNT will facilitate researchers requiring comprehensive hierarchical grouping of large tree topologies in, for example, database curation, microarray design and pangenome assignments. The application is available at the greengenes website 1.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Filogenia , Programas Informáticos , Algoritmos , Clasificación , Sistemas de Administración de Bases de Datos , ARN Ribosómico 16S/análisis
14.
Stat Appl Genet Mol Biol ; 5: Article8, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16646872

RESUMEN

Recently Peres and Shields discovered a new method for estimating the order of a stationary fixed order Markov chain. They showed that the estimator is consistent by proving a threshold result. While this threshold is valid asymptotically in the limit, it is not very useful for DNA sequence analysis where data sizes are moderate. In this paper we give a novel interpretation of the Peres-Shields estimator as a sharp transition phenomenon. This yields a precise and powerful estimator that quickly identifies the core dependencies in data. We show that it compares favorably to other estimators, especially in the presence of variable dependencies. Motivated by this last point, we extend the Peres-Shields estimator to Variable Length Markov Chains. We compare it to a well-established estimator and show that it is superior in terms of the predictive likelihood. We give an application to the problem of detecting DNA sequence similarity in plasmids.


Asunto(s)
Cadenas de Markov , Análisis de Secuencia de ADN/métodos , Elementos Transponibles de ADN , Modelos Estadísticos , Plásmidos/química
15.
Stud Health Technol Inform ; 216: 1065, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26262364

RESUMEN

Late phase clinical trials are regularly outsourced to a Contract Research Organisation (CRO) while the risk and accountability remain within the sponsor company. Many statistical tasks are delivered by the CRO and later revalidated by the sponsor. Here, we report a technological approach to standardised event prediction. We have built a dynamic web application around an R-package with the aim of delivering reliable event predictions, simplifying communication and increasing trust between the CRO and the in-house statisticians via transparency. Short learning curve, interactivity, reproducibility and data diagnostics are key here. The current implementation is motivated by time-to-event prediction in oncology. We demonstrate a clear benefit of standardisation for both parties. The tool can be used for exploration, communication, sensitivity analysis and generating standard reports. At this point we wish to present this tool and share some of the insights we have gained during the development.


Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos/organización & administración , Ensayos Clínicos como Asunto/estadística & datos numéricos , Monitoreo de Drogas/métodos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/epidemiología , Registros Electrónicos de Salud/estadística & datos numéricos , Servicios Externos/estadística & datos numéricos , Simulación por Computador , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/diagnóstico , Registros Electrónicos de Salud/clasificación , Humanos , Incidencia , Modelos Estadísticos , Medición de Riesgo/métodos , Programas Informáticos , Reino Unido/epidemiología
16.
PLoS One ; 8(8): e70568, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23950964

RESUMEN

An important challenge in drug discovery and disease prognosis is to predict genes that are preferentially expressed in one or a few tissues, i.e. showing a considerably higher expression in one tissue(s) compared to the others. Although several data sources and methods have been published explicitly for this purpose, they often disagree and it is not evident how to retrieve these genes and how to distinguish true biological findings from those that are due to choice-of-method and/or experimental settings. In this work we have developed a computational approach that combines results from multiple methods and datasets with the aim to eliminate method/study-specific biases and to improve the predictability of preferentially expressed human genes. A rule-based score is used to merge and assign support to the results. Five sets of genes with known tissue specificity were used for parameter pruning and cross-validation. In total we identify 3434 tissue-specific genes. We compare the genes of highest scores with the public databases: PaGenBase (microarray), TiGER (EST) and HPA (protein expression data). The results have 85% overlap to PaGenBase, 71% to TiGER and only 28% to HPA. 99% of our predictions have support from at least one of these databases. Our approach also performs better than any of the databases on identifying drug targets and biomarkers with known tissue-specificity.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Algoritmos , Análisis por Conglomerados , Bases de Datos Genéticas , Humanos , Especificidad de Órganos/genética
17.
Genome Med ; 1(9): 88, 2009 Sep 29.
Artículo en Inglés | MEDLINE | ID: mdl-19754960

RESUMEN

Systems biology has matured considerably as a discipline over the last decade, yet some of the key challenges separating current research efforts in systems biology and clinically useful results are only now becoming apparent. As these gaps are better defined, the new discipline of systems medicine is emerging as a translational extension of systems biology. How is systems medicine defined? What are relevant ontologies for systems medicine? What are the key theoretic and methodologic challenges facing computational disease modeling? How are inaccurate and incomplete data, and uncertain biologic knowledge best synthesized in useful computational models? Does network analysis provide clinically useful insight? We discuss the outstanding difficulties in translating a rapidly growing body of data into knowledge usable at the bedside. Although core-specific challenges are best met by specialized groups, it appears fundamental that such efforts should be guided by a roadmap for systems medicine drafted by a coalition of scientists from the clinical, experimental, computational, and theoretic domains.

18.
PLoS One ; 3(7): e2607, 2008 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-18612393

RESUMEN

BACKGROUND: Environments and their organic content are generally not static and isolated, but in a constant state of exchange and interaction with each other. Through physical or biological processes, organisms, especially microbes, may be transferred between environments whose characteristics may be quite different. The transferred microbes may not survive in their new environment, but their DNA will be deposited. In this study, we compare two environmental sequencing projects to find molecular evidence of transfer of microbes over vast geographical distances. METHODOLOGY: By studying synonymous nucleotide composition, oligomer frequency and orthology between predicted genes in metagenomics data from two environments, terrestrial and aquatic, and by correlating with phylogenetic mappings, we find that both environments are likely to contain trace amounts of microbes which have been far removed from their original habitat. We also suggest a bias in direction from soil to sea, which is consistent with the cycles of planetary wind and water. CONCLUSIONS: Our findings support the Baas-Becking hypothesis formulated in 1934, which states that due to dispersion and population sizes, microbes are likely to be found in widely disparate environments. Furthermore, the availability of genetic material from distant environments is a possible font of novel gene functions for lateral gene transfer.


Asunto(s)
Ambiente , Genes Bacterianos , Ecología , Ecosistema , Transferencia de Gen Horizontal , Filogenia , Microbiología del Agua
19.
Bioinformatics ; 22(5): 517-22, 2006 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-16403797

RESUMEN

MOTIVATION: Analyses of genomic signatures are gaining attention as they allow studies of species-specific relationships without involving alignments of homologous sequences. A naïve Bayesian classifier was built to discriminate between different bacterial compositions of short oligomers, also known as DNA words. The classifier has proven successful in identifying foreign genes in Neisseria meningitis. In this study we extend the classifier approach using either a fixed higher order Markov model (Mk) or a variable length Markov model (VLMk). RESULTS: We propose a simple algorithm to lock a variable length Markov model to a certain number of parameters and show that the use of Markov models greatly increases the flexibility and accuracy in prediction to that of a naïve model. We also test the integrity of classifiers in terms of false-negatives and give estimates of the minimal sizes of training data. We end the report by proposing a method to reject a false hypothesis of horizontal gene transfer. AVAILABILITY: Software and Supplementary information available at www.cs.chalmers.se/~dalevi/genetic_sign_classifiers/.


Asunto(s)
Mapeo Cromosómico/métodos , Dermatoglifia del ADN/métodos , ADN Bacteriano/genética , Transferencia de Gen Horizontal/genética , Genoma Bacteriano/genética , Reconocimiento de Normas Patrones Automatizadas/métodos , Análisis de Secuencia de ADN/métodos , Inteligencia Artificial , Teorema de Bayes , Cadenas de Markov , Modelos Genéticos , Modelos Estadísticos , Especificidad de la Especie
20.
Bioinformatics ; 20(18): 3628-35, 2004 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-15297302

RESUMEN

UNLABELLED: A set of new algorithms and software tools for automatic protein identification using peptide mass fingerprinting is presented. The software is automatic, fast and modular to suit different laboratory needs, and it can be operated either via a Java user interface or called from within scripts. The software modules do peak extraction, peak filtering and protein database matching, and communicate via XML. Individual modules can therefore easily be replaced with other software if desired, and all intermediate results are available to the user. The algorithms are designed to operate without human intervention and contain several novel approaches. The performance and capabilities of the software is illustrated on spectra from different mass spectrometer manufacturers, and the factors influencing successful identification are discussed and quantified. MOTIVATION: Protein identification with mass spectrometric methods is a key step in modern proteomics studies. Some tools are available today for doing different steps in the analysis. Only a few commercial systems integrate all the steps in the analysis, often for only one vendor's hardware, and the details of these systems are not public. RESULTS: A complete system for doing protein identification with peptide mass fingerprints is presented, including everything from peak picking to matching the database protein. The details of the different algorithms are disclosed so that academic researchers can have full control of their tools. AVAILABILITY: The described software tools are available from the Halmstad University website www.hh.se/staff/bioinf/ SUPPLEMENTARY INFORMATION: Details of the algorithms are described in supporting information available from the Halmstad University website www.hh.se/staff/bioinf/


Asunto(s)
Mapeo Peptídico/métodos , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/métodos , Interfaz Usuario-Computador , Algoritmos , Sistemas de Administración de Bases de Datos , Documentación/métodos , Almacenamiento y Recuperación de la Información/métodos , Lenguajes de Programación , Proteínas/análisis
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA