Pesquisa | Secretaria de Estado da Saúde

1.

Dutch genome diagnostic laboratories accelerated and improved variant interpretation and increased accuracy by sharing data.

Fokkema, Ivo F A C; van der Velde, Kasper J; Slofstra, Mariska K; Ruivenkamp, Claudia A L; Vogel, Maartje J; Pfundt, Rolph; Blok, Marinus J; Lekanne Deprez, Ronald H; Waisfisz, Quinten; Abbott, Kristin M; Sinke, Richard J; Rahman, Rubayte; Nijman, Isaäc J; de Koning, Bart; Thijs, Gert; Wieskamp, Nienke; Moritz, Ruben J G; Charbon, Bart; Saris, Jasper J; den Dunnen, Johan T; Laros, Jeroen F J; Swertz, Morris A; van Gijn, Marielle E.

Hum Mutat ; 40(12): 2230-2238, 2019 12.

Artigo em Inglês | MEDLINE | ID: mdl-31433103

RESUMO

Each year diagnostic laboratories in the Netherlands profile thousands of individuals for heritable disease using next-generation sequencing (NGS). This requires pathogenicity classification of millions of DNA variants on the standard 5-tier scale. To reduce time spent on data interpretation and increase data quality and reliability, the nine Dutch labs decided to publicly share their classifications. Variant classifications of nearly 100,000 unique variants were catalogued and compared in a centralized MOLGENIS database. Variants classified by more than one center were labeled as "consensus" when classifications agreed, and shared internationally with LOVD and ClinVar. When classifications opposed (LB/B vs. LP/P), they were labeled "conflicting", while other nonconsensus observations were labeled "no consensus". We assessed our classifications using the InterVar software to compare to ACMG 2015 guidelines, showing 99.7% overall consistency with only 0.3% discrepancies. Differences in classifications between Dutch labs or between Dutch labs and ACMG were mainly present in genes with low penetrance or for late onset disorders and highlight limitations of the current 5-tier classification system. The data sharing boosted the quality of DNA diagnostics in Dutch labs, an initiative we hope will be followed internationally. Recently, a positive match with a case from outside our consortium resulted in a more definite disease diagnosis.

Assuntos

Doenças Genéticas Inatas/diagnóstico , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Disseminação de Informação/métodos , Confiabilidade dos Dados , Bases de Dados Genéticas , Doenças Genéticas Inatas/genética , Guias como Assunto , Humanos , Laboratórios , Países Baixos , Análise de Sequência de DNA

2.

Somatic Tumor Variant Filtration Strategies to Optimize Tumor-Only Molecular Profiling Using Targeted Next-Generation Sequencing Panels.

Sukhai, Mahadeo A; Misyura, Maksym; Thomas, Mariam; Garg, Swati; Zhang, Tong; Stickle, Natalie; Virtanen, Carl; Bedard, Philippe L; Siu, Lillian L; Smets, Tina; Thijs, Gert; Van Vooren, Steven; Kamel-Reid, Suzanne; Stockley, Tracy L.

J Mol Diagn ; 21(2): 261-273, 2019 03.

Artigo em Inglês | MEDLINE | ID: mdl-30576869

RESUMO

A common approach in clinical diagnostic laboratories to variant assessment from tumor molecular profiling is sequencing of genomic DNA extracted from both tumor (somatic) and normal (germline) tissue, with subsequent variant comparison to identify true somatic variants with potential impact on patient treatment or prognosis. However, challenges exist in paired tumor-normal testing, including increased cost of dual sample testing and identification of germline cancer predisposing variants. Alternatively, somatic variants can be identified by in silico tumor-only variant filtration precluding the need for matched normal testing. The barrier to tumor-only variant filtration is defining a reliable approach, with high sensitivity and specificity to identify somatic variants. In this study, we used retrospective data sets from paired tumor-normal samples tested on small (48 gene) and large (555 gene) targeted next-generation sequencing panels, to model algorithms for tumor-only variants classification. The optimal algorithm required an ordinal filtering approach using information from variant population databases (1000 Genomes Phase 3, ESP6500, ExAC), clinical mutation databases (ClinVar), and information on recurring clinically relevant somatic variants. Overall the tumor-only variant filtration strategy described in this study can define clinically relevant somatic variants from tumor-only analysis with sensitivity of 97% to 99% and specificity of 87% to 94%, and with significant potential utility for clinical laboratories implementing tumor-only molecular profiling.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Biologia Computacional/métodos , Humanos , Mutação/genética , Neoplasias/genética , Estudos Retrospectivos

3.

Assessing computational tools for the discovery of transcription factor binding sites.

Tompa, Martin; Li, Nan; Bailey, Timothy L; Church, George M; De Moor, Bart; Eskin, Eleazar; Favorov, Alexander V; Frith, Martin C; Fu, Yutao; Kent, W James; Makeev, Vsevolod J; Mironov, Andrei A; Noble, William Stafford; Pavesi, Giulio; Pesole, Graziano; Régnier, Mireille; Simonis, Nicolas; Sinha, Saurabh; Thijs, Gert; van Helden, Jacques; Vandenbogaert, Mathias; Weng, Zhiping; Workman, Christopher; Ye, Chun; Zhu, Zhou.

Nat Biotechnol ; 23(1): 137-44, 2005 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-15637633

RESUMO

The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.

Assuntos

Biologia Computacional/métodos , Expressão Gênica , Transcrição Gênica , Motivos de Aminoácidos , Animais , Sítios de Ligação , Bases de Dados de Proteínas , Drosophila , Proteínas Fúngicas/química , Humanos , Internet , Camundongos , Reprodutibilidade dos Testes , Software

4.

Spectrophores as one-dimensional descriptors calculated from three-dimensional atomic properties: applications ranging from scaffold hopping to multi-target virtual screening.

Gladysz, Rafaela; Dos Santos, Fabio Mendes; Langenaeker, Wilfried; Thijs, Gert; Augustyns, Koen; De Winter, Hans.

J Cheminform ; 10(1): 9, 2018 Mar 07.

Artigo em Inglês | MEDLINE | ID: mdl-29516311

RESUMO

Spectrophores are novel descriptors that are calculated from the three-dimensional atomic properties of molecules. In our current implementation, the atomic properties that were used to calculate spectrophores include atomic partial charges, atomic lipophilicity indices, atomic shape deviations and atomic softness properties. This approach can easily be widened to also include additional atomic properties. Our novel methodology finds its roots in the experimental affinity fingerprinting technology developed in the 1990's by Terrapin Technologies. Here we have translated it into a purely virtual approach using artificial affinity cages and a simplified metric to calculate the interaction between these cages and the atomic properties. A typical spectrophore consists of a vector of 48 real numbers. This makes it highly suitable for the calculation of a wide range of similarity measures for use in virtual screening and for the investigation of quantitative structure-activity relationships in combination with advanced statistical approaches such as self-organizing maps, support vector machines and neural networks. In our present report we demonstrate the applicability of our novel methodology for scaffold hopping as well as virtual screening.

5.

TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis.

Aerts, Stein; Van Loo, Peter; Thijs, Gert; Mayer, Herbert; de Martin, Rainer; Moreau, Yves; De Moor, Bart.

Nucleic Acids Res ; 33(Web Server issue): W393-6, 2005 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-15980497

RESUMO

We present the second and improved release of the TOUCAN workbench for cis-regulatory sequence analysis. TOUCAN implements and integrates fast state-of-the-art methods and strategies in gene regulation bioinformatics, including algorithms for comparative genomics and for the detection of cis-regulatory modules. This second release of TOUCAN has become open source and thereby carries the potential to evolve rapidly. The main goal of TOUCAN is to allow a user to come to testable hypotheses regarding the regulation of a gene or of a set of co-regulated genes. TOUCAN can be launched from this location: http://www.esat.kuleuven.ac.be/~saerts/software/toucan.php.

Assuntos

Regulação da Expressão Gênica , Sequências Reguladoras de Ácido Nucleico , Análise de Sequência de DNA/métodos , Software , Algoritmos , Genômica , Internet , Interface Usuário-Computador

6.

More robust detection of motifs in coexpressed genes by using phylogenetic information.

Monsieurs, Pieter; Thijs, Gert; Fadda, Abeer A; De Keersmaecker, Sigrid C J; Vanderleyden, Jozef; De Moor, Bart; Marchal, Kathleen.

BMC Bioinformatics ; 7: 160, 2006 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-16549017

RESUMO

BACKGROUND: Several motif detection algorithms have been developed to discover overrepresented motifs in sets of coexpressed genes. However, in a noisy gene list, the number of genes containing the motif versus the number lacking the motif might not be sufficiently high to allow detection by classical motif detection tools. To still recover motifs which are not significantly enriched but still present, we developed a procedure in which we use phylogenetic footprinting to first delineate all potential motifs in each gene. Then we mutually compare all detected motifs and identify the ones that are shared by at least a few genes in the data set as potential candidates. RESULTS: We applied our methodology to a compiled test data set containing known regulatory motifs and to two biological data sets derived from genome wide expression studies. By executing four consecutive steps of 1) identifying conserved regions in orthologous intergenic regions, 2) aligning these conserved regions, 3) clustering the conserved regions containing similar regulatory regions followed by extraction of the regulatory motifs and 4) screening the input intergenic sequences with detected regulatory motif models, our methodology proves to be a powerful tool for detecting regulatory motifs when a low signal to noise ratio is present in the input data set. Comparing our results with two other motif detection algorithms points out the robustness of our algorithm. CONCLUSION: We developed an approach that can reliably identify multiple regulatory motifs lacking a high degree of overrepresentation in a set of coexpressed genes (motifs belonging to sparsely connected hubs in the regulatory network) by exploiting the advantages of using both coexpression and phylogenetic information.

Assuntos

Algoritmos , DNA Bacteriano/genética , Regulação Bacteriana da Expressão Gênica , Filogenia , Sequências Reguladoras de Ácido Nucleico/genética , Análise de Sequência de DNA , Yersinia pestis/genética , Sequência de Bases , Análise por Conglomerados , Sequência Consenso/genética , Pegada de DNA , Perfilação da Expressão Gênica , Dados de Sequência Molecular , Análise de Sequência com Séries de Oligonucleotídeos , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos

7.

Toucan: deciphering the cis-regulatory logic of coregulated genes.

Aerts, Stein; Thijs, Gert; Coessens, Bert; Staes, Mik; Moreau, Yves; De Moor, Bart.

Nucleic Acids Res ; 31(6): 1753-64, 2003 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-12626717

RESUMO

TOUCAN is a Java application for the rapid discovery of significant cis-regulatory elements from sets of coexpressed or coregulated genes. Biologists can automatically (i) retrieve genes and intergenic regions, (ii) identify putative regulatory regions, (iii) score sequences for known transcription factor binding sites, (iv) identify candidate motifs for unknown binding sites, and (v) detect those statistically over-represented sites that are characteristic for a gene set. Genes or intergenic regions are retrieved from Ensembl or EMBL, together with orthologs and supporting information. Orthologs are aligned and syntenic regions are selected as candidate regulatory regions. Putative sites for known transcription factors are detected using our MotifScanner, which scores position weight matrices using a probabilistic model. New motifs are detected using our MotifSampler based on Gibbs sampling. Binding sites characteristic for a gene set--and thus statistically over-represented with respect to a reference sequence set--are found using a binomial test. We have validated Toucan by analyzing muscle-specific genes, liver-specific genes and E2F target genes; we have easily detected many known binding sites within intergenic DNA and identified new biologically plausible sites for known and unknown transcription factors. Software available at http://www.esat.kuleuven.ac. be/ approximately dna/BioI/Software.html.

Assuntos

Proteínas de Ciclo Celular , Proteínas de Ligação a DNA , Regulação da Expressão Gênica/genética , Software , Algoritmos , Sítios de Ligação/genética , Biologia Computacional/métodos , Fatores de Transcrição E2F , Genoma Humano , Humanos , Fígado/metabolismo , Músculos/metabolismo , Regiões Promotoras Genéticas/genética , Fatores de Transcrição/metabolismo

8.

INCLUSive: A web portal and service registry for microarray and regulatory sequence analysis.

Coessens, Bert; Thijs, Gert; Aerts, Stein; Marchal, Kathleen; De Smet, Frank; Engelen, Kristof; Glenisson, Patrick; Moreau, Yves; Mathys, Janick; De Moor, Bart.

Nucleic Acids Res ; 31(13): 3468-70, 2003 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-12824346

RESUMO

INCLUSive is a suite of algorithms and tools for the analysis of gene expression data and the discovery of cis-regulatory sequence elements. The tools allow normalization, filtering and clustering of microarray data, functional scoring of gene clusters, sequence retrieval, and detection of known and unknown regulatory elements using probabilistic sequence models and Gibbs sampling. All tools are available via different web pages and as web services. The web pages are connected and integrated to reflect a methodology and facilitate complex analysis using different tools. The web services can be invoked using standard SOAP messaging. Example clients are available for download to invoke the services from a remote computer or to be integrated with other applications. All services are catalogued and described in a web service registry. The INCLUSive web portal is available for academic purposes at http://www.esat.kuleuven.ac.be/inclusive.

Assuntos

Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Sequências Reguladoras de Ácido Nucleico , Software , Algoritmos , Análise por Conglomerados , Internet , Sistema de Registros , Análise de Sequência/métodos , Integração de Sistemas

9.

PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences.

Lescot, Magali; Déhais, Patrice; Thijs, Gert; Marchal, Kathleen; Moreau, Yves; Van de Peer, Yves; Rouzé, Pierre; Rombauts, Stephane.

Nucleic Acids Res ; 30(1): 325-7, 2002 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-11752327

RESUMO

PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific transcription factor sites, levels of confidence for the experimental evidence, functional information and the position on the promoter are given as well. New features have been implemented to search for plant cis-acting regulatory elements in a query sequence. Furthermore, links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes. New regulatory elements can be sent automatically and will be added to the database after curation. The PlantCARE relational database is available via the World Wide Web at http://sphinx.rug.ac.be:8080/PlantCARE/.

Assuntos

Bases de Dados de Ácidos Nucleicos , Regulação da Expressão Gênica de Plantas , Genes de Plantas , Regiões Promotoras Genéticas , Sequência Consenso , Elementos Facilitadores Genéticos , Genoma de Planta , Armazenamento e Recuperação da Informação , Internet , Família Multigênica , Sequências Reguladoras de Ácido Nucleico , Integração de Sistemas , Transcrição Gênica

10.

Genome-specific higher-order background models to improve motif detection.

Marchal, Kathleen; Thijs, Gert; De Keersmaecker, Sigrid; Monsieurs, Pieter; De Moor, Bart; Vanderleyden, Jos.

Trends Microbiol ; 11(2): 61-6, 2003 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-12598125

RESUMO

Motif detection based on Gibbs sampling is a common procedure used to retrieve regulatory motifs in silico. Using a species-specific background model was previously shown to increase the robustness of the algorithm. Here, we demonstrate that selecting a non-species-adapted background model can have an adverse effect on the results of motif detection. The large differences in the average nucleotide composition of prokaryotic sequences exacerbate the problem of exchanging background models. Therefore, we have developed complex background models for all prokaryotic species with available genome sequences.

Assuntos

Genômica , Modelos Genéticos , Sequências Reguladoras de Ácido Nucleico , Análise de Sequência de DNA , Algoritmos , DNA Intergênico/análise , Escherichia coli/genética , Genoma , Regiões Promotoras Genéticas , Pseudomonas aeruginosa/genética , Fator sigma/genética , Especificidade da Espécie

11.

Computational detection of cis -regulatory modules.

Aerts, Stein; Van Loo, Peter; Thijs, Gert; Moreau, Yves; De Moor, Bart.

Bioinformatics ; 19 Suppl 2: ii5-14, 2003 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-14534164

RESUMO

MOTIVATION: The transcriptional regulation of a metazoan gene depends on the cooperative action of multiple transcription factors that bind to cis-regulatory modules (CRMs) located in the neighborhood of the gene. By integrating multiple signals, CRMs confer an organism specific spatial and temporal rate of transcription. RESULTS: Based on the hypothesis that genes that are needed in exactly the same conditions might share similar regulatory switches, we have developed a novel methodology to find CRMs in a set of coexpressed or coregulated genes. The ModuleSearcher algorithm finds for a given gene set the best scoring combination of transcription factor binding sites within a sequence window using an A(*)procedure for tree searching. To keep the level of noise low, we use DNA sequences that are most likely to contain functional cis-regulatory information, namely conserved regions between human and mouse orthologous genes. The ModuleScanner performs genomic searches with a predicted CRM or with a user-defined CRM known from the literature to find possible target genes. The validity of a set of putative targets is checked using Gene Ontology annotations. We demonstrate the use and effectiveness of the ModuleSearcher and ModuleScanner algorithms and test their specificity and sensitivity on semi-artificial data. Next, we search for a module in a cluster of gene expression profiles of human cell cycle genes. AVAILABILITY: The ModuleSearcher is available as a web service within the TOUCAN workbench for regulatory sequence analysis, which can be downloaded from http://www.esat.kuleuven.ac.be/~dna/BioI.

Assuntos

Algoritmos , Mapeamento Cromossômico/métodos , Elementos Reguladores de Transcrição/genética , Análise de Sequência de DNA/métodos , Software , Fatores de Transcrição/genética , Transcrição Gênica/genética , Sequência de Bases , Sítios de Ligação , Dados de Sequência Molecular , Ligação Proteica

12.

Comprehensive analysis of the base composition around the transcription start site in Metazoa.

Aerts, Stein; Thijs, Gert; Dabrowski, Michal; Moreau, Yves; De Moor, Bart.

BMC Genomics ; 5(1): 34, 2004 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-15171795

RESUMO

BACKGROUND: The transcription start site of a metazoan gene remains poorly understood, mostly because there is no clear signal present in all genes. Now that several sequenced metazoan genomes have been annotated, we have been able to compare the base composition around the transcription start site for all annotated genes across multiple genomes. RESULTS: The most prominent feature in the base compositions is a significant local variation in G+C content over a large region around the transcription start site. The change is present in all animal phyla but the extent of variation is different between distinct classes of vertebrates, and the shape of the variation is completely different between vertebrates and arthropods. Furthermore, the height of the variation correlates with CpG frequencies in vertebrates but not in invertebrates and it also correlates with gene expression, especially in mammals. We also detect GC and AT skews in all clades (where %G is not equal to %C or %A is not equal to %T respectively) but these occur in a more confined region around the transcription start site and in the coding region. CONCLUSIONS: The dramatic changes in nucleotide composition in humans are a consequence of CpG nucleotide frequencies and of gene expression, the changes in Fugu could point to primordial CpG islands, and the changes in the fly are of a totally different kind and unrelated to dinucleotide frequencies.

Assuntos

DNA/genética , Evolução Molecular , Sítio de Iniciação de Transcrição , Sequência Rica em At/genética , Animais , Anopheles/genética , Composição de Bases/genética , Caenorhabditis/genética , Ilhas de CpG/genética , DNA de Helmintos/genética , Bases de Dados Genéticas/normas , Drosophila melanogaster/genética , Sequência Rica em GC/genética , Expressão Gênica/genética , Variação Genética/genética , Humanos , Camundongos , Ratos , Takifugu/genética , Peixe-Zebra/genética

13.

A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes.

Thijs, Gert; Marchal, Kathleen; Lescot, Magali; Rombauts, Stephane; De Moor, Bart; Rouzé, Pierre; Moreau, Yves.

J Comput Biol ; 9(2): 447-64, 2002.

Artigo em Inglês | MEDLINE | ID: mdl-12015892

RESUMO

Microarray experiments can reveal important information about transcriptional regulation. In our case, we look for potential promoter regulatory elements in the upstream region of coexpressed genes. Here we present two modifications of the original Gibbs sampling algorithm for motif finding (Lawrence et al., 1993). First, we introduce the use of a probability distribution to estimate the number of copies of the motif in a sequence. Second, we describe the technical aspects of the incorporation of a higher-order background model whose application we discussed in Thijs et al. (2001). Our implementation is referred to as the Motif Sampler. We successfully validate our algorithm on several data sets. First, we show results for three sets of upstream sequences containing known motifs: 1) the G-box light-response element in plants, 2) elements involved in methionine response in Saccharomyces cerevisiae, and 3) the FNR O(2)-responsive element in bacteria. We use these data sets to explain the influence of the parameters on the performance of our algorithm. Second, we show results for upstream sequences from four clusters of coexpressed genes identified in a microarray experiment on wounding in Arabidopsis thaliana. Several motifs could be matched to regulatory elements from plant defence pathways in our database of plant cis-acting regulatory elements (PlantCARE). Some other strong motifs do not have corresponding motifs in PlantCARE but are promising candidates for further analysis.

Assuntos

Algoritmos , Perfilação da Expressão Gênica/estatística & dados numéricos , Arabidopsis/genética , Bactérias/genética , Sequência de Bases , Biologia Computacional , DNA/genética , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Saccharomyces cerevisiae/genética

14.

Pharao: pharmacophore alignment and optimization.

Taminau, Jonatan; Thijs, Gert; De Winter, Hans.

J Mol Graph Model ; 27(2): 161-9, 2008 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-18485770

RESUMO

Within the context of early drug discovery, a new pharmacophore-based tool to score and align small molecules (Pharao) is described. The tool is built on the idea to model pharmacophoric features by Gaussian 3D volumes instead of the more common point or sphere representations. The smooth nature of these continuous functions has a beneficent effect on the optimization problem introduced during alignment. The usefulness of Pharao is illustrated by means of three examples: a virtual screening of trypsin-binding ligands, a virtual screening of phosphodiesterase 5-binding ligands, and an investigation of the biological relevance of an unsupervised clustering of small ligands based on Pharao.

Assuntos

Algoritmos , Sistemas de Liberação de Medicamentos , Desenho de Fármacos , Análise por Conglomerados , Ligação de Hidrogênio , Ligantes , Modelos Moleculares , Conformação Molecular , Software , Relação Estrutura-Atividade

15.

A novel approach to identifying regulatory motifs in distantly related genomes.

Van Hellemont, Ruth; Monsieurs, Pieter; Thijs, Gert; de Moor, Bart; Van de Peer, Yves; Marchal, Kathleen.

Genome Biol ; 6(13): R113, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-16420672

RESUMO

Although proven successful in the identification of regulatory motifs, phylogenetic footprinting methods still show some shortcomings. To assess these difficulties, most apparent when applying phylogenetic footprinting to distantly related organisms, we developed a two-step procedure that combines the advantages of sequence alignment and motif detection approaches. The results on well-studied benchmark datasets indicate that the presented method outperforms other methods when the sequences become either too long or too heterogeneous in size.

Assuntos

Biologia Computacional/métodos , Genoma/genética , Sequências Reguladoras de Ácido Nucleico/genética , Sequência de Aminoácidos , Animais , Pareamento de Bases/genética , Sequência Conservada , Bases de Dados Genéticas , Proteínas do Olho/química , Proteínas de Homeodomínio/química , Humanos , Dados de Sequência Molecular , Fator de Transcrição PAX6 , Fatores de Transcrição Box Pareados/química , Filogenia , Proteínas Repressoras/química

16.

Adaptive quality-based clustering of gene expression profiles.

De Smet, Frank; Mathys, Janick; Marchal, Kathleen; Thijs, Gert; De Moor, Bart; Moreau, Yves.

Bioinformatics ; 18(5): 735-46, 2002 May.

Artigo em Inglês | MEDLINE | ID: mdl-12050070

RESUMO

MOTIVATION: Microarray experiments generate a considerable amount of data, which analyzed properly help us gain a huge amount of biologically relevant information about the global cellular behaviour. Clustering (grouping genes with similar expression profiles) is one of the first steps in data analysis of high-throughput expression measurements. A number of clustering algorithms have proved useful to make sense of such data. These classical algorithms, though useful, suffer from several drawbacks (e.g. they require the predefinition of arbitrary parameters like the number of clusters; they force every gene into a cluster despite a low correlation with other cluster members). In the following we describe a novel adaptive quality-based clustering algorithm that tackles some of these drawbacks. RESULTS: We propose a heuristic iterative two-step algorithm: First, we find in the high-dimensional representation of the data a sphere where the "density" of expression profiles is locally maximal (based on a preliminary estimate of the radius of the cluster-quality-based approach). In a second step, we derive an optimal radius of the cluster (adaptive approach) so that only the significantly coexpressed genes are included in the cluster. This estimation is achieved by fitting a model to the data using an EM-algorithm. By inferring the radius from the data itself, the biologist is freed from finding an optimal value for this radius by trial-and-error. The computational complexity of this method is approximately linear in the number of gene expression profiles in the data set. Finally, our method is successfully validated using existing data sets. AVAILABILITY: http://www.esat.kuleuven.ac.be/~thijs/Work/Clustering.html

Assuntos

Algoritmos , Análise por Conglomerados , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/estatística & dados numéricos , Simulação por Computador , Genoma Fúngico , Mitose/genética , Modelos Genéticos , Modelos Estatísticos , Saccharomyces cerevisiae/genética , Sensibilidade e Especificidade

17.

In silico identification and experimental validation of PmrAB targets in Salmonella typhimurium by regulatory motif detection.

Marchal, Kathleen; De Keersmaecker, Sigrid; Monsieurs, Pieter; van Boxel, Nadja; Lemmens, Karen; Thijs, Gert; Vanderleyden, Jos; De Moor, Bart.

Genome Biol ; 5(2): R9, 2004.

Artigo em Inglês | MEDLINE | ID: mdl-14759259

RESUMO

BACKGROUND: The PmrAB (BasSR) two-component regulatory system is required for Salmonella typhimurium virulence. PmrAB-controlled modifications of the lipopolysaccharide (LPS) layer confer resistance to cationic antibiotic polypeptides, which may allow bacteria to survive within macrophages. The PmrAB system also confers resistance to Fe3+-mediated killing. New targets of the system have recently been discovered that seem not to have a role in the well-described functions of PmrAB, suggesting that the PmrAB-dependent regulon might contain additional, unidentified targets. RESULTS: We performed an in silico analysis of possible targets of the PmrAB system. Using a motif model of the PmrA binding site in DNA, genome-wide screening was carried out to detect PmrAB target genes. To increase confidence in the predictions, all putative targets were subjected to a cross-species comparison (phylogenetic footprinting) using a Gibbs sampling-based motif-detection procedure. As well as the known targets, we detected additional targets with unknown functions. Four of these were experimentally validated (yibD, aroQ, mig-13 and sseJ). Site-directed mutagenesis of the PmrA-binding site (PmrA box) in yibD revealed specific sequence requirements. CONCLUSIONS: We demonstrated the efficiency of our procedure by recovering most of the known PmrAB-dependent targets and by identifying unknown targets that we were able to validate experimentally. We also pinpointed directions for further research that could help elucidate the S. typhimurium virulence pathway.

Assuntos

Proteínas de Bactérias/metabolismo , DNA Bacteriano/análise , Sequências Reguladoras de Ácido Nucleico , Salmonella typhimurium/genética , Fatores de Transcrição/metabolismo , Sequência de Bases , Sítios de Ligação , Pegada de DNA , DNA Bacteriano/metabolismo , Genes Reporter , Genoma Bacteriano , Dados de Sequência Molecular , Mutagênese Sítio-Dirigida , Filogenia , Salmonella typhimurium/patogenicidade , Alinhamento de Sequência , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico

18.

INCLUSive: integrated clustering, upstream sequence retrieval and motif sampling.

Thijs, Gert; Moreau, Yves; De Smet, Frank; Mathys, Janick; Lescot, Magali; Rombauts, Stephane; Rouze, Pierre; De Moor, Bart; Marchal, Kathleen.

Bioinformatics ; 18(2): 331-2, 2002 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-11847086

RESUMO

INCLUSive allows automatic multistep analysis of microarray data (clustering and motif finding). The clustering algorithm (adaptive quality-based clustering) groups together genes with highly similar expression profiles. The upstream sequences of the genes belonging to a cluster are automatically retrieved from GenBank and can be fed directly into Motif Sampler, a Gibbs sampling algorithm that retrieves statistically over-represented motifs in sets of sequences, in this case upstream regions of co-expressed genes.

Assuntos

Família Multigênica , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Software , Algoritmos , Análise por Conglomerados , Biologia Computacional , Bases de Dados Genéticas , Perfilação da Expressão Gênica/estatística & dados numéricos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa