Pesquisa | Secretaria de Estado da Saúde

1.

Sequencing and de novo assembly of 150 genomes from Denmark as a population reference.

Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent; Sibbesen, Jonas Andreas; Liu, Siyang; Villesen, Palle; Skov, Laurits; Belling, Kirstine; Theil Have, Christian; Izarzugaza, Jose M G; Grosjean, Marie; Bork-Jensen, Jette; Grove, Jakob; Als, Thomas D; Huang, Shujia; Chang, Yuqi; Xu, Ruiqi; Ye, Weijian; Rao, Junhua; Guo, Xiaosen; Sun, Jihua; Cao, Hongzhi; Ye, Chen; van Beusekom, Johan; Espeseth, Thomas; Flindt, Esben; Friborg, Rune M; Halager, Anders E; Le Hellard, Stephanie; Hultman, Christina M; Lescai, Francesco; Li, Shengting; Lund, Ole; Løngren, Peter; Mailund, Thomas; Matey-Hernandez, Maria Luisa; Mors, Ole; Pedersen, Christian N S; Sicheritz-Pontén, Thomas; Sullivan, Patrick; Syed, Ali; Westergaard, David; Yadav, Rachita; Li, Ning; Xu, Xun; Hansen, Torben; Krogh, Anders; Bolund, Lars; Sørensen, Thorkild I A; Pedersen, Oluf.

Nature ; 548(7665): 87-91, 2017 08 03.

Artigo em Inglês | MEDLINE | ID: mdl-28746312

RESUMO

Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.

Assuntos

Variação Genética/genética , Genética Populacional/normas , Genoma Humano/genética , Genômica/normas , Análise de Sequência de DNA/normas , Adulto , Alelos , Criança , Cromossomos Humanos Y/genética , Dinamarca , Feminino , Haplótipos/genética , Humanos , Complexo Principal de Histocompatibilidade/genética , Masculino , Idade Materna , Taxa de Mutação , Idade Paterna , Mutação Puntual/genética , Padrões de Referência

2.

Toxin inhibition in C. crescentus VapBC1 is mediated by a flexible pseudo-palindromic protein motif and modulated by DNA binding.

Bendtsen, Kirstine L; Xu, Kehan; Luckmann, Majbritt; Winther, Kristoffer S; Shah, Shiraz A; Pedersen, Christian N S; Brodersen, Ditlev E.

Nucleic Acids Res ; 45(5): 2875-2886, 2017 03 17.

Artigo em Inglês | MEDLINE | ID: mdl-27998932

RESUMO

Expression of bacterial type II toxin-antitoxin (TA) systems is regulated at the transcriptional level through direct binding of the antitoxin to pseudo-palindromic sequences on operator DNA. In this context, the toxin functions as a co-repressor by stimulating DNA binding through direct interaction with the antitoxin. Here, we determine crystal structures of the complete 90 kDa heterooctameric VapBC1 complex from Caulobacter crescentus CB15 both in isolation and bound to its cognate DNA operator sequence at 1.6 and 2.7 Å resolution, respectively. DNA binding is associated with a dramatic architectural rearrangement of conserved TA interactions in which C-terminal extended structures of the antitoxin VapB1 swap positions to interlock the complex in the DNA-bound state. We further show that a pseudo-palindromic protein sequence in the antitoxin is responsible for this interaction and required for binding and inactivation of the VapC1 toxin dimer. Sequence analysis of 4127 orthologous VapB sequences reveals that such palindromic protein sequences are widespread and unique to bacterial and archaeal VapB antitoxins suggesting a general principle governing regulation of VapBC TA systems. Finally, a structure of C-terminally truncated VapB1 bound to VapC1 reveals discrete states of the TA interaction that suggest a structural basis for toxin activation in vivo.

Assuntos

Proteínas de Bactérias/química , Toxinas Bacterianas/química , Caulobacter crescentus/genética , DNA Bacteriano/química , Proteínas de Ligação a DNA/química , Glicoproteínas de Membrana/química , Regiões Operadoras Genéticas , Motivos de Aminoácidos , Proteínas de Bactérias/metabolismo , Toxinas Bacterianas/antagonistas & inibidores , Toxinas Bacterianas/metabolismo , DNA Bacteriano/metabolismo , Proteínas de Ligação a DNA/metabolismo , Glicoproteínas de Membrana/metabolismo , Modelos Moleculares , Conformação de Ácido Nucleico , Ligação Proteica , Domínios Proteicos

3.

Computational discovery of specificity-conferring sites in non-ribosomal peptide synthetases.

Knudsen, Michael; Søndergaard, Dan; Tofting-Olesen, Claus; Hansen, Frederik T; Brodersen, Ditlev Egeskov; Pedersen, Christian N S.

Bioinformatics ; 32(3): 325-9, 2016 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-26471456

RESUMO

MOTIVATION: By using a class of large modular enzymes known as Non-Ribosomal Peptide Synthetases (NRPS), bacteria and fungi are capable of synthesizing a large variety of secondary metabolites, many of which are bioactive and have potential, pharmaceutical applications as e.g. antibiotics. There is thus an interest in predicting the compound synthesized by an NRPS from its primary structure (amino acid sequence) alone, as this would enable an in silico search of whole genomes for NRPS enzymes capable of synthesizing potentially useful compounds. RESULTS: NRPS synthesis happens in a conveyor belt-like fashion where each individual NRPS module is responsible for incorporating a specific substrate (typically an amino acid) into the final product. Here, we present a new method for predicting substrate specificities of individual NRPS modules based on occurrences of motifs in their primary structures. We compare our classifier with existing methods and discuss possible biological explanations of how the motifs might relate to substrate specificity. AVAILABILITY AND IMPLEMENTATION: SEQL-NRPS is available as a web service implemented in Python with Flask at http://services.birc.au.dk/seql-nrps and source code available at https://bitbucket.org/dansondergaard/seql-nrps/. CONTACT: micknudsen@gmail.com or cstorm@birc.au.dk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Bactérias/enzimologia , Fungos/enzimologia , Peptídeo Sintases/química , Análise de Sequência de Proteína/métodos , Motivos de Aminoácidos , Simulação por Computador , Peptídeo Sintases/metabolismo , Especificidade por Substrato

4.

tqDist: a library for computing the quartet and triplet distances between binary or general trees.

Sand, Andreas; Holt, Morten K; Johansen, Jens; Brodal, Gerth Stølting; Mailund, Thomas; Pedersen, Christian N S.

Bioinformatics ; 30(14): 2079-80, 2014 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-24651968

RESUMO

UNLABELLED: tqDist is a software package for computing the triplet and quartet distances between general rooted or unrooted trees, respectively. The program is based on algorithms with running time [Formula: see text] for the triplet distance calculation and [Formula: see text] for the quartet distance calculation, where n is the number of leaves in the trees and d is the degree of the tree with minimum degree. These are currently the fastest algorithms both in theory and in practice. AVAILABILITY AND IMPLEMENTATION: tqDist can be installed on Windows, Linux and Mac OS X. Doing this will install a set of command-line tools together with a Python module and an R package for scripting in Python or R. The software package is freely available under the GNU LGPL licence at http://birc.au.dk/software/tqDist.

Assuntos

Filogenia , Software , Algoritmos , Classificação/métodos

5.

zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm.

Sand, Andreas; Kristiansen, Martin; Pedersen, Christian N S; Mailund, Thomas.

BMC Bioinformatics ; 14: 339, 2013 Nov 22.

Artigo em Inglês | MEDLINE | ID: mdl-24266924

RESUMO

BACKGROUND: Hidden Markov models are widely used for genome analysis as they combine ease of modelling with efficient analysis algorithms. Calculating the likelihood of a model using the forward algorithm has worst case time complexity linear in the length of the sequence and quadratic in the number of states in the model. For genome analysis, however, the length runs to millions or billions of observations, and when maximising the likelihood hundreds of evaluations are often needed. A time efficient forward algorithm is therefore a key ingredient in an efficient hidden Markov model library. RESULTS: We have built a software library for efficiently computing the likelihood of a hidden Markov model. The library exploits commonly occurring substrings in the input to reuse computations in the forward algorithm. In a pre-processing step our library identifies common substrings and builds a structure over the computations in the forward algorithm which can be reused. This analysis can be saved between uses of the library and is independent of concrete hidden Markov models so one preprocessing can be used to run a number of different models.Using this library, we achieve up to 78 times shorter wall-clock time for realistic whole-genome analyses with a real and reasonably complex hidden Markov model. In one particular case the analysis was performed in less than 8 minutes compared to 9.6 hours for the previously fastest library. CONCLUSIONS: We have implemented the preprocessing procedure and forward algorithm as a C++ library, zipHMM, with Python bindings for use in scripts. The library is available at http://birc.au.dk/software/ziphmm/.

Assuntos

Cadeias de Markov , Biblioteca de Peptídeos , Software , Algoritmos , Animais , Simulação por Computador , Gorilla gorilla/genética , Humanos , Funções Verossimilhança , Estudos Observacionais como Assunto , Pan troglodytes/genética , Pongo/genética , Probabilidade , Fatores de Tempo

6.

A practical O(n log2 n) time algorithm for computing the triplet distance on binary trees.

Sand, Andreas; Brodal, Gerth Stølting; Fagerberg, Rolf; Pedersen, Christian N S; Mailund, Thomas.

BMC Bioinformatics ; 14 Suppl 2: S18, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23368759

RESUMO

The triplet distance is a distance measure that compares two rooted trees on the same set of leaves by enumerating all sub-sets of three leaves and counting how often the induced topologies of the tree are equal or different. We present an algorithm that computes the triplet distance between two rooted binary trees in time O (n log2 n). The algorithm is related to an algorithm for computing the quartet distance between two unrooted binary trees in time O (n log n). While the quartet distance algorithm has a very severe overhead in the asymptotic time complexity that makes it impractical compared to O (n2) time algorithms, we show through experiments that the triplet distance algorithm can be implemented to give a competitive wall-time running time.

Assuntos

Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Filogenia , Software

7.

Characterising RNA secondary structure space using information entropy.

Sükösd, Zsuzsanna; Knudsen, Bjarne; Anderson, James W J; Novák, Adám; Kjems, Jørgen; Pedersen, Christian N S.

BMC Bioinformatics ; 14 Suppl 2: S22, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23368905

RESUMO

Comparative methods for RNA secondary structure prediction use evolutionary information from RNA alignments to increase prediction accuracy. The model is often described in terms of stochastic context-free grammars (SCFGs), which generate a probability distribution over secondary structures. It is, however, unclear how this probability distribution changes as a function of the input alignment. As prediction programs typically only return a single secondary structure, better characterisation of the underlying probability space of RNA secondary structures is of great interest. In this work, we show how to efficiently compute the information entropy of the probability distribution over RNA secondary structures produced for RNA alignments by a phylo-SCFG, and implement it for the PPfold model. We also discuss interpretations and applications of this quantity, including how it can clarify reasons for low prediction reliability scores. PPfold and its source code are available from http://birc.au.dk/software/ppfold/.

Assuntos

Algoritmos , Modelos Teóricos , Conformação de Ácido Nucleico , RNA/química , Sequência de Bases , Biologia Computacional/métodos , Entropia , Probabilidade , Software

8.

UniMoG--a unifying framework for genomic distance calculation and sorting based on DCJ.

Hilker, Rolf; Sickinger, Corinna; Pedersen, Christian N S; Stoye, Jens.

Bioinformatics ; 28(19): 2509-11, 2012 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-22815356

RESUMO

SUMMARY: UniMoG is a software combining five genome rearrangement models: double cut and join (DCJ), restricted DCJ, Hannenhalli and Pevzner (HP), inversion and translocation. It can compute the pairwise genomic distances and a corresponding optimal sorting scenario for an arbitrary number of genomes. All five models can be unified through the DCJ model, thus the implementation is based on DCJ and, where reasonable, uses the most efficient existing algorithms for each distance and sorting problem. Both textual and graphical output is possible for visualizing the operations. AVAILABILITY AND IMPLEMENTATION: The software is available through the Bielefeld University Bioinformatics Web Server at http://bibiserv.techfak.uni-bielefeld.de/dcj with instructions and example data. CONTACT: rhilker@cebitec.uni-bielefeld.de.

Assuntos

Algoritmos , Biologia Computacional/métodos , Genômica/métodos , Software , Internet , Modelos Genéticos , Interface Usuário-Computador

9.

PPfold 3.0: fast RNA secondary structure prediction using phylogeny and auxiliary data.

Sükösd, Zsuzsanna; Knudsen, Bjarne; Kjems, Jørgen; Pedersen, Christian N S.

Bioinformatics ; 28(20): 2691-2, 2012 Oct 15.

Artigo em Inglês | MEDLINE | ID: mdl-22877864

RESUMO

UNLABELLED: PPfold is a multi-threaded implementation of the Pfold algorithm for RNA secondary structure prediction. Here we present a new version of PPfold, which extends the evolutionary analysis with a flexible probabilistic model for incorporating auxiliary data, such as data from structure probing experiments. Our tests show that the accuracy of single-sequence secondary structure prediction using experimental data in PPfold 3.0 is comparable to RNAstructure. Furthermore, alignment structure prediction quality is improved even further by the addition of experimental data. PPfold 3.0 therefore has the potential of producing more accurate predictions than it was previously possible. AVAILABILITY AND IMPLEMENTATION: PPfold 3.0 is available as a platform-independent Java application and can be downloaded from http://birc.au.dk/software/ppfold.

Assuntos

RNA/química , Software , Algoritmos , Modelos Estatísticos , Conformação de Ácido Nucleico , Filogenia

10.

shortran: a pipeline for small RNA-seq data analysis.

Gupta, Vikas; Markmann, Katharina; Pedersen, Christian N S; Stougaard, Jens; Andersen, Stig U.

Bioinformatics ; 28(20): 2698-700, 2012 Oct 15.

Artigo em Inglês | MEDLINE | ID: mdl-22914220

RESUMO

UNLABELLED: High-throughput sequencing currently generates a wealth of small RNA (sRNA) data, making data mining a topical issue. Processing of these large data sets is inherently multidimensional as length, abundance, sequence composition, and genomic location all hold clues to sRNA function. Analysis can be challenging because the formulation and testing of complex hypotheses requires combined use of visualization, annotation and abundance profiling. To allow flexible generation and querying of these disparate types of information, we have developed the shortran pipeline for analysis of plant or animal short RNA sequencing data. It comprises nine modules and produces both graphical and MySQL format output. AVAILABILITY: shortran is freely available and can be downloaded from http://users-mb.au.dk/pmgrp/shortran/.

Assuntos

Pequeno RNA não Traduzido/química , Software , Arabidopsis/genética , Mineração de Dados , Genômica , Anotação de Sequência Molecular , Análise de Sequência de RNA/métodos

11.

Using inverted indices for accelerating LINGO calculations.

Kristensen, Thomas G; Nielsen, Jesper; Pedersen, Christian N S.

J Chem Inf Model ; 51(3): 597-600, 2011 Mar 28.

Artigo em Inglês | MEDLINE | ID: mdl-21332133

RESUMO

The ever growing size of chemical databases calls for the development of novel methods for representing and comparing molecules. One such method called LINGO is based on fragmenting the SMILES string representation of molecules. Comparison of molecules can then be performed by calculating the Tanimoto coefficient, which is called LINGOsim when used on LINGO multisets. This paper introduces a verbose representation for storing LINGO multisets, which makes it possible to transform them into sparse fingerprints such that fingerprint data structures and algorithms can be used to accelerate queries. The previous best method for rapidly calculating the LINGOsim similarity matrix required specialized hardware to yield a significant speedup over existing methods. By representing LINGO multisets in the verbose representation and using inverted indices, it is possible to calculate LINGOsim similarity matrices roughly 2.6 times faster than existing methods without relying on specialized hardware.

Assuntos

Biologia Computacional , Bases de Dados de Compostos Químicos , Algoritmos

12.

Molecular docking with ligand attached water molecules.

Lie, Mette A; Thomsen, René; Pedersen, Christian N S; Schiøtt, Birgit; Christensen, Mikael H.

J Chem Inf Model ; 51(4): 909-17, 2011 Apr 25.

Artigo em Inglês | MEDLINE | ID: mdl-21452852

RESUMO

A novel approach to incorporate water molecules in protein-ligand docking is proposed. In this method, the water molecules display the same flexibility during the docking simulation as the ligand. The method solvates the ligand with the maximum number of water molecules, and these are then retained or displaced depending on energy contributions during the docking simulation. Instead of being a static part of the receptor, each water molecule is a flexible on/off part of the ligand and is treated with the same flexibility as the ligand itself. To favor exclusion of the water molecules, a constant entropy penalty is added for each included water molecule. The method was evaluated using 12 structurally diverse protein-ligand complexes from the PDB, where several water molecules bridge the ligand and the protein. A considerable improvement in successful docking simulations was found when including flexible water molecules solvating hydrogen bonding groups of the ligand. The method has been implemented in the docking program Molegro Virtual Docker (MVD).

Assuntos

Sítios de Ligação , Simulação por Computador , Ligantes , Proteínas/química , Água/química , Algoritmos , Proteínas de Transporte , Ligação de Hidrogênio , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Termodinâmica

13.

A fast algorithm for genome-wide haplotype pattern mining.

Besenbacher, Søren; Pedersen, Christian N S; Mailund, Thomas.

BMC Bioinformatics ; 10 Suppl 1: S74, 2009 Jan 30.

Artigo em Inglês | MEDLINE | ID: mdl-19208179

RESUMO

BACKGROUND: Identifying the genetic components of common diseases has long been an important area of research. Recently, genotyping technology has reached the level where it is cost effective to genotype single nucleotide polymorphism (SNP) markers covering the entire genome, in thousands of individuals, and analyse such data for markers associated with a diseases. The statistical power to detect association, however, is limited when markers are analysed one at a time. This can be alleviated by considering multiple markers simultaneously. The Haplotype Pattern Mining (HPM) method is a machine learning approach to do exactly this. RESULTS: We present a new, faster algorithm for the HPM method. The new approach use patterns of haplotype diversity in the genome: locally in the genome, the number of observed haplotypes is much smaller than the total number of possible haplotypes. We show that the new approach speeds up the HPM method with a factor of 2 on a genome-wide dataset with 5009 individuals typed in 491208 markers using default parameters and more if the pattern length is increased. CONCLUSION: The new algorithm speeds up the HPM method and we show that it is feasible to apply HPM to whole genome association mapping with thousands of individuals and hundreds of thousands of markers.

Assuntos

Algoritmos , Biologia Computacional/métodos , Genoma Humano , Haplótipos/genética , Bases de Dados Genéticas , Marcadores Genéticos , Predisposição Genética para Doença , Variação Genética , Humanos , Polimorfismo de Nucleotídeo Único

14.

BS-virus-finder: virus integration calling using bisulfite sequencing data.

Gao, Shengjie; Hu, Xuesong; Xu, Fengping; Gao, Changduo; Xiong, Kai; Zhao, Xiao; Chen, Haixiao; Zhao, Shancen; Wang, Mengyao; Fu, Dongke; Zhao, Xiaohui; Bai, Jie; Mao, Likai; Li, Bo; Wu, Song; Wang, Jian; Li, Shengbin; Yang, Huangming; Bolund, Lars; Pedersen, Christian N S.

Gigascience ; 7(1): 1-7, 2018 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-29267855

RESUMO

Background: DNA methylation plays a key role in the regulation of gene expression and carcinogenesis. Bisulfite sequencing studies mainly focus on calling single nucleotide polymorphism, different methylation region, and find allele-specific DNA methylation. Until now, only a few software tools have focused on virus integration using bisulfite sequencing data. Findings: We have developed a new and easy-to-use software tool, named BS-virus-finder (BSVF, RRID:SCR_015727), to detect viral integration breakpoints in whole human genomes. The tool is hosted at https://github.com/BGI-SZ/BSVF. Conclusions: BS-virus-finder demonstrates high sensitivity and specificity. It is useful in epigenetic studies and to reveal the relationship between viral integration and DNA methylation. BS-virus-finder is the first software tool to detect virus integration loci by using bisulfite sequencing data.

Assuntos

DNA Viral/genética , Genoma Humano , Vírus da Hepatite B/genética , Hepatócitos/virologia , Software , Integração Viral , Pareamento de Bases , Sequência de Bases , Linhagem Celular Tumoral , Metilação de DNA , Epigênese Genética , Hepatócitos/metabolismo , Hepatócitos/patologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Sensibilidade e Especificidade , Sulfitos/química , Sequenciamento Completo do Genoma

15.

GeneRecon--a coalescent based tool for fine-scale association mapping.

Mailund, Thomas; Schierup, Mikkel H; Pedersen, Christian N S; Madsen, Jesper N; Hein, Jotun; Schauser, Leif.

Bioinformatics ; 22(18): 2317-8, 2006 Sep 15.

Artigo em Inglês | MEDLINE | ID: mdl-16632491

RESUMO

UNLABELLED: GeneRecon is a tool for fine-scale association mapping using a coalescence model. GeneRecon takes as input case-control data from phased or unphased SNP and microsatellite genotypes. The posterior distribution of disease locus position is obtained by Metropolis-Hastings sampling in the state space of genealogies. Input format, search strategy and the sampled statistics can be configured through the Guile Scheme programming language embedded in GeneRecon, making GeneRecon highly configurable. AVAILABILITY: The source code for GeneRecon, written in C++ and Scheme, is available under the GNU General Public License (GPL) at http://www.birc.au.dk/~mailund/GeneRecon CONTACT: mailund@birc.au.dk.

Assuntos

Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , Predisposição Genética para Doença/genética , Genética Populacional , Desequilíbrio de Ligação/genética , Modelos Genéticos , Software , Algoritmos , Animais , Ligação Genética/genética , Humanos , Modelos Estatísticos , Análise de Sequência de DNA/métodos

16.

Prediction of Primary Tumors in Cancers of Unknown Primary.

Søndergaard, Dan; Nielsen, Svend; Pedersen, Christian N S; Besenbacher, Søren.

J Integr Bioinform ; 14(2)2017 Jul 07.

Artigo em Inglês | MEDLINE | ID: mdl-28686574

RESUMO

A cancer of unknown primary (CUP) is a metastatic cancer for which standard diagnostic tests fail to identify the location of the primary tumor. CUPs account for 3-5% of cancer cases. Using molecular data to determine the location of the primary tumor in such cases can help doctors make the right treatment choice and thus improve the clinical outcome. In this paper, we present a new method for predicting the location of the primary tumor using gene expression data: locating cancers of unknown primary (LoCUP). The method models the data as a mixture of normal and tumor cells and thus allows correct classification even in impure samples, where the tumor biopsy is contaminated by a large fraction of normal cells. We find that our method provides a significant increase in classification accuracy (95.8% over 90.8%) on simulated low-purity metastatic samples and shows potential on a small dataset of real metastasis samples with known origin.

Assuntos

Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Neoplasias Primárias Desconhecidas/genética , Neoplasias Primárias Desconhecidas/terapia , Biópsia , Humanos

17.

Recrafting the neighbor-joining method.

Mailund, Thomas; Brodal, Gerth S; Fagerberg, Rolf; Pedersen, Christian N S; Phillips, Derek.

BMC Bioinformatics ; 7: 29, 2006 Jan 19.

Artigo em Inglês | MEDLINE | ID: mdl-16423304

RESUMO

BACKGROUND: The neighbor-joining method by Saitou and Nei is a widely used method for constructing phylogenetic trees. The formulation of the method gives rise to a canonical Theta(n3) algorithm upon which all existing implementations are based. RESULTS: In this paper we present techniques for speeding up the canonical neighbor-joining method. Our algorithms construct the same phylogenetic trees as the canonical neighbor-joining method. The best-case running time of our algorithms are O(n2) but the worst-case remains O(n3). We empirically evaluate the performance of our algoritms on distance matrices obtained from the Pfam collection of alignments. The experiments indicate that the running time of our algorithms evolve as Theta(n2) on the examined instance collection. We also compare the running time with that of the QuickTree tool, a widely used efficient implementation of the canonical neighbor-joining method. CONCLUSION: The experiments show that our algorithms also yield a significant speed-up, already for medium sized instances.

Assuntos

Biologia Computacional/métodos , Algoritmos , Animais , Análise por Conglomerados , Simulação por Computador , Evolução Molecular , Humanos , Funções Verossimilhança , Modelos Estatísticos , Filogenia , Alinhamento de Sequência , Análise de Sequência de Proteína , Software

18.

HydDB: A web tool for hydrogenase classification and analysis.

Søndergaard, Dan; Pedersen, Christian N S; Greening, Chris.

Sci Rep ; 6: 34212, 2016 Sep 27.

Artigo em Inglês | MEDLINE | ID: mdl-27670643

RESUMO

H2 metabolism is proposed to be the most ancient and diverse mechanism of energy-conservation. The metalloenzymes mediating this metabolism, hydrogenases, are encoded by over 60 microbial phyla and are present in all major ecosystems. We developed a classification system and web tool, HydDB, for the structural and functional analysis of these enzymes. We show that hydrogenase function can be predicted by primary sequence alone using an expanded classification scheme (comprising 29 [NiFe], 8 [FeFe], and 1 [Fe] hydrogenase classes) that defines 11 new classes with distinct biological functions. Using this scheme, we built a web tool that rapidly and reliably classifies hydrogenase primary sequences using a combination of k-nearest neighbors' algorithms and CDD referencing. Demonstrating its capacity, the tool reliably predicted hydrogenase content and function in 12 newly-sequenced bacteria, archaea, and eukaryotes. HydDB provides the capacity to browse the amino acid sequences of 3248 annotated hydrogenase catalytic subunits and also contains a detailed repository of physiological, biochemical, and structural information about the 38 hydrogenase classes defined here. The database and classifier are freely and publicly available at http://services.birc.au.dk/hyddb/.

19.

CoaSim: a flexible environment for simulating genetic data under coalescent models.

Mailund, Thomas; Schierup, Mikkel H; Pedersen, Christian N S; Mechlenborg, Peter J M; Madsen, Jesper N; Schauser, Leif.

BMC Bioinformatics ; 6: 252, 2005 Oct 14.

Artigo em Inglês | MEDLINE | ID: mdl-16225674

RESUMO

BACKGROUND: Coalescent simulations are playing a large role in interpreting large scale intra-specific sequence or polymorphism surveys and for planning and evaluating association studies. Coalescent simulations of data sets under different models can be compared to the actual data to test the importance of different evolutionary factors and thus get insight into these. RESULTS: We have created the CoaSim application as a flexible environment for Monte Carlo simulation of various types of genetic data under equilibrium and non-equilibrium coalescent processes for a variety of applications. Interaction with the tool is through the Guile version of the Scheme scripting language. Scheme scripts for many standard and advanced applications are provided and these can easily be modified by the user for a much wider range of applications. A graphical user interface with less functionality and flexibility is also included. It is primarily intended as an exploratory and educational tool CONCLUSION: CoaSim is a powerful tool because of its flexibility and ease of use. This is illustrated through very varied uses of the application, e.g. evaluation of association mapping methods, parametric bootstrapping, and design and choice of markers for specific questions.

Assuntos

Simulação por Computador , Marcadores Genéticos , Predisposição Genética para Doença/genética , Software , Estudos de Casos e Controles , Interpretação Estatística de Dados , Suscetibilidade a Doenças , Humanos , Modelos Genéticos , Método de Monte Carlo , Polimorfismo Genético , Polimorfismo de Nucleotídeo Único

20.

Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios.

Besenbacher, Søren; Liu, Siyang; Izarzugaza, José M G; Grove, Jakob; Belling, Kirstine; Bork-Jensen, Jette; Huang, Shujia; Als, Thomas D; Li, Shengting; Yadav, Rachita; Rubio-García, Arcadio; Lescai, Francesco; Demontis, Ditte; Rao, Junhua; Ye, Weijian; Mailund, Thomas; Friborg, Rune M; Pedersen, Christian N S; Xu, Ruiqi; Sun, Jihua; Liu, Hao; Wang, Ou; Cheng, Xiaofang; Flores, David; Rydza, Emil; Rapacki, Kristoffer; Damm Sørensen, John; Chmura, Piotr; Westergaard, David; Dworzynski, Piotr; Sørensen, Thorkild I A; Lund, Ole; Hansen, Torben; Xu, Xun; Li, Ning; Bolund, Lars; Pedersen, Oluf; Eiberg, Hans; Krogh, Anders; Børglum, Anders D; Brunak, Søren; Kristiansen, Karsten; Schierup, Mikkel H; Wang, Jun; Gupta, Ramneek; Villesen, Palle; Rasmussen, Simon.

Nat Commun ; 6: 5969, 2015 Jan 19.

Artigo em Inglês | MEDLINE | ID: mdl-25597990

RESUMO

Building a population-specific catalogue of single nucleotide variants (SNVs), indels and structural variants (SVs) with frequencies, termed a national pan-genome, is critical for further advancing clinical and public health genetics in large cohorts. Here we report a Danish pan-genome obtained from sequencing 10 trios to high depth (50 × ). We report 536k novel SNVs and 283k novel short indels from mapping approaches and develop a population-wide de novo assembly approach to identify 132k novel indels larger than 10 nucleotides with low false discovery rates. We identify a higher proportion of indels and SVs than previous efforts showing the merits of high coverage and de novo assembly approaches. In addition, we use trio information to identify de novo mutations and use a probabilistic method to provide direct estimates of 1.27e-8 and 1.5e-9 per nucleotide per generation for SNVs and indels, respectively.

Assuntos

Genoma Humano/genética , Algoritmos , Humanos , Taxa de Mutação , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA/métodos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa