Pesquisa | Portal Regional da BVS

1.

Loss of copy numbers of retrotransposons (HERVK) on chromosome 7p11.2 impacts EGFR (Epidermal Growth Factor Receptor)-induced phenotypes for platinum sensitivity and long-term survival in ovarian cancer-A study from the OVCAD consortium.

Fromhage, Gesa; Obermayr, Eva; Bednarz-Knoll, Natalia; Van Gorp, Toon; Welsch, Eva; Polterauer, Stephan; Braicu, Elena Ioana; Mahner, Sven; Sehouli, Jalid; Vergote, Ignace; Concin, Nicole; Kurtz, Stefan; Steinbiss, Sascha; Torge, Antje; Zeillinger, Robert; Wölber, Linn; Brandt, Burkhard.

Int J Cancer ; 2024 May 06.

Artigo em Inglês | MEDLINE | ID: mdl-38709956

RESUMO

We analyzed variations in the epidermal growth factor receptor (EGFR) gene and 5'-upstream region to identify potential molecular predictors of treatment response in primary epithelial ovarian cancer. Tumor tissues collected during debulking surgery from the prospective multicenter OVCAD study were investigated. Copy number variations in the human endogenous retrovirus sequence human endogenous retrovirus K9 (HERVK9) and EGFR Exons 7 and 9, as well as repeat length and loss of heterozygosity of polymorphic CA-SSR I and relative EGFR mRNA expression were determined quantitatively. At least one EGFR variation was observed in 94% of the patients. Among the 30 combinations of variations discovered, enhanced platinum sensitivity (n = 151) was found dominantly with HERVK9 haploidy and Exon 7 tetraploidy, overrepresented among patients with survival ≥120 months (24/29, p = .0212). EGFR overexpression (≥80 percentile) was significantly less likely in the responders (17% vs. 32%, p = .044). Multivariate Cox regression analysis, including age, FIGO stage, and grade, indicated that the patients' subgroup was prognostically significant for CA-SSR I repeat length <18 CA for both alleles (HR 0.276, 95% confidence interval 0.109-0.655, p = .001). Although EGFR variations occur in ovarian cancer, the mRNA levels remain low compared to other EGFR-mutated cancers. Notably, the inherited length of the CA-SSR I repeat, HERVK9 haploidy, and Exon 7 tetraploidy conferred three times higher odds ratio to survive for more than 10 years under therapy. This may add value in guiding therapies if determined during follow-up in circulating tumor cells or circulating tumor DNA and offers HERVK9 as a potential therapeutic target.

2.

Genome Assemblies across the Diverse Evolutionary Spectrum of Leishmania Protozoan Parasites.

Warren, Wesley C; Akopyants, Natalia S; Dobson, Deborah E; Hertz-Fowler, Christiane; Lye, Lon-Fye; Myler, Peter J; Ramasamy, Gowthaman; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Tomlinson, Chad; Wilson, Richard K; Beverley, Stephen M.

Microbiol Resour Announc ; 10(35): e0054521, 2021 Sep 02.

Artigo em Inglês | MEDLINE | ID: mdl-34472979

RESUMO

We report the high-quality draft assemblies and gene annotations for 13 species and/or strains of the protozoan parasite genera Leishmania, Endotrypanum, and Crithidia, which span the phylogenetic diversity of the subfamily Leishmaniinae within the kinetoplastid order of the phylum Euglenazoa. These resources will support studies on the origins of parasitism.

3.

Multilocus Analysis Resolves the European Finch Epidemic Strain of Trichomonas gallinae and Suggests Introgression from Divergent Trichomonads.

Alrefaei, Abdulwahed Fahad; Low, Ross; Hall, Neil; Jardim, Rodrigo; Dávila, Alberto; Gerhold, Rick; John, Shinto; Steinbiss, Sascha; Cunningham, Andrew A; Lawson, Becki; Bell, Diana; Tyler, Kevin.

Genome Biol Evol ; 11(8): 2391-2402, 2019 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-31364699

RESUMO

In Europe, Trichomonas gallinae recently emerged as a cause of epidemic disease in songbirds. A clonal strain of the parasite, first found in the United Kingdom, has become the predominant strain there and spread to continental Europe. Discriminating this epidemic strain of T. gallinae from other strains necessitated development of multilocus sequence typing (MLST). Development of the MLST was facilitated by the assembly and annotation of a 54.7 Mb draft genome of a cloned stabilate of the A1 European finch epidemic strain (isolated from Greenfinch, Chloris chloris, XT-1081/07 in 2007) containing 21,924 protein coding genes. This enabled construction of a robust 19 locus MLST based on existing typing loci for Trichomonas vaginalis and T. gallinae. Our MLST has the sensitivity to discriminate strains within existing genotypes confidently, and resolves the American finch A1 genotype from the European finch epidemic A1 genotype. Interestingly, one isolate we obtained from a captive black-naped fruit dove Ptilinopsus melanospilus, was not truly T. gallinae but a hybrid of T. gallinae with a distant trichomonad lineage. Phylogenetic analysis of the individual loci in this fruit dove provides evidence of gene flow between distant trichomonad lineages at 2 of the 19 loci examined and may provide precedence for the emergence of other hybrid trichomonad genomes including T. vaginalis.

Assuntos

Doenças das Aves/parasitologia , Evolução Molecular , Tentilhões/parasitologia , Genoma de Protozoário , Proteínas de Protozoários/genética , Tricomoníase/veterinária , Trichomonas/genética , Animais , Doenças das Aves/epidemiologia , DNA de Protozoário/genética , Regulação da Expressão Gênica , Tipagem de Sequências Multilocus , Filogenia , Transcriptoma , Trichomonas/isolamento & purificação , Tricomoníase/epidemiologia , Tricomoníase/parasitologia

4.

Genome organization and DNA accessibility control antigenic variation in trypanosomes.

Müller, Laura S M; Cosentino, Raúl O; Förstner, Konrad U; Guizetti, Julien; Wedel, Carolin; Kaplan, Noam; Janzen, Christian J; Arampatzi, Panagiota; Vogel, Jörg; Steinbiss, Sascha; Otto, Thomas D; Saliba, Antoine-Emmanuel; Sebra, Robert P; Siegel, T Nicolai.

Nature ; 563(7729): 121-125, 2018 11.

Artigo em Inglês | MEDLINE | ID: mdl-30333624

RESUMO

Many evolutionarily distant pathogenic organisms have evolved similar survival strategies to evade the immune responses of their hosts. These include antigenic variation, through which an infecting organism prevents clearance by periodically altering the identity of proteins that are visible to the immune system of the host1. Antigenic variation requires large reservoirs of immunologically diverse antigen genes, which are often generated through homologous recombination, as well as mechanisms to ensure the expression of one or very few antigens at any given time. Both homologous recombination and gene expression are affected by three-dimensional genome architecture and local DNA accessibility2,3. Factors that link three-dimensional genome architecture, local chromatin conformation and antigenic variation have, to our knowledge, not yet been identified in any organism. One of the major obstacles to studying the role of genome architecture in antigenic variation has been the highly repetitive nature and heterozygosity of antigen-gene arrays, which has precluded complete genome assembly in many pathogens. Here we report the de novo haplotype-specific assembly and scaffolding of the long antigen-gene arrays of the model protozoan parasite Trypanosoma brucei, using long-read sequencing technology and conserved features of chromosome folding4. Genome-wide chromosome conformation capture (Hi-C) reveals a distinct partitioning of the genome, with antigen-encoding subtelomeric regions that are folded into distinct, highly compact compartments. In addition, we performed a range of analyses-Hi-C, fluorescence in situ hybridization, assays for transposase-accessible chromatin using sequencing and single-cell RNA sequencing-that showed that deletion of the histone variants H3.V and H4.V increases antigen-gene clustering, DNA accessibility across sites of antigen expression and switching of the expressed antigen isoform, via homologous recombination. Our analyses identify histone variants as a molecular link between global genome architecture, local chromatin conformation and antigenic variation.

Assuntos

Variação Antigênica/genética , Cromatina/genética , Cromatina/metabolismo , DNA de Protozoário/metabolismo , Genoma/genética , Trypanosoma brucei brucei/genética , Trypanosoma brucei brucei/imunologia , DNA de Protozoário/genética , Haplótipos/genética , Histonas/deficiência , Histonas/genética , Família Multigênica/genética , Isoformas de Proteínas/biossíntese , Isoformas de Proteínas/genética , Glicoproteínas Variantes de Superfície de Trypanosoma/biossíntese , Glicoproteínas Variantes de Superfície de Trypanosoma/genética

5.

Complete avian malaria parasite genomes reveal features associated with lineage-specific evolution in birds and mammals.

Böhme, Ulrike; Otto, Thomas D; Cotton, James A; Steinbiss, Sascha; Sanders, Mandy; Oyola, Samuel O; Nicot, Antoine; Gandon, Sylvain; Patra, Kailash P; Herd, Colin; Bushell, Ellen; Modrzynska, Katarzyna K; Billker, Oliver; Vinetz, Joseph M; Rivero, Ana; Newbold, Chris I; Berriman, Matthew.

Genome Res ; 28(4): 547-560, 2018 04.

Artigo em Inglês | MEDLINE | ID: mdl-29500236

RESUMO

Avian malaria parasites are prevalent around the world and infect a wide diversity of bird species. Here, we report the sequencing and analysis of high-quality draft genome sequences for two avian malaria species, Plasmodium relictum and Plasmodium gallinaceum We identify 50 genes that are specific to avian malaria, located in an otherwise conserved core of the genome that shares gene synteny with all other sequenced malaria genomes. Phylogenetic analysis suggests that the avian malaria species form an outgroup to the mammalian Plasmodium species, and using amino acid divergence between species, we estimate the avian- and mammalian-infective lineages diverged in the order of 10 million years ago. Consistent with their phylogenetic position, we identify orthologs of genes that had previously appeared to be restricted to the clades of parasites containing Plasmodium falciparum and Plasmodium vivax, the species with the greatest impact on human health. From these orthologs, we explore differential diversifying selection across the genus and show that the avian lineage is remarkable in the extent to which invasion-related genes are evolving. The subtelomeres of the P. relictum and P. gallinaceum genomes contain several novel gene families, including an expanded surf multigene family. We also identify an expansion of reticulocyte binding protein homologs in P. relictum, and within these proteins, we detect distinct regions that are specific to nonhuman primate, humans, rodent, and avian hosts. For the first time in the Plasmodium lineage, we find evidence of transposable elements, including several hundred fragments of LTR-retrotransposons in both species and an apparently complete LTR-retrotransposon in the genome of P. gallinaceum.

Assuntos

Malária Aviária/genética , Plasmodium falciparum/genética , Plasmodium vivax/genética , Plasmodium/genética , Animais , Aves/parasitologia , Evolução Molecular , Humanos , Malária Aviária/parasitologia , Mamíferos/parasitologia , Filogenia , Plasmodium/patogenicidade , Plasmodium falciparum/patogenicidade , Plasmodium vivax/patogenicidade

6.

First Draft Genome Sequence of the Dourine Causative Agent: Trypanosoma Equiperdum Strain OVI.

Hébert, Laurent; Moumen, Bouziane; Madeline, Anthony; Steinbiss, Sascha; Lakhdar, Latifa; Van Reet, Nick; Büscher, Philippe; Laugier, Claire; Cauchard, Julien; Petry, Sandrine.

J Genomics ; 5: 1-3, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28138343

RESUMO

Trypanosoma equiperdum is the causative agent of dourine, a sexually-transmitted infection of horses. This parasite belongs to the subgenus Trypanozoon that also includes the agent of sleeping sickness (Trypanosoma brucei) and surra (Trypanosoma evansi). We herein report the genome sequence of a T. equiperdum strain OVI, isolated from a horse in South-Africa in 1976. This is the first genome sequence of the T. equiperdum species, and its availability will provide important insights for future studies on genetic classification of the subgenus Trypanozoon.

7.

EuPathDB: the eukaryotic pathogen genomics database resource.

Aurrecoechea, Cristina; Barreto, Ana; Basenko, Evelina Y; Brestelli, John; Brunk, Brian P; Cade, Shon; Crouch, Kathryn; Doherty, Ryan; Falke, Dave; Fischer, Steve; Gajria, Bindu; Harb, Omar S; Heiges, Mark; Hertz-Fowler, Christiane; Hu, Sufen; Iodice, John; Kissinger, Jessica C; Lawrence, Cris; Li, Wei; Pinney, Deborah F; Pulman, Jane A; Roos, David S; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Stoeckert, Christian J; Spruill, Drew; Wang, Haiming; Warrenfeltz, Susanne; Zheng, Jie.

Nucleic Acids Res ; 45(D1): D581-D591, 2017 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-27903906

RESUMO

The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host-pathogen interactions.

Assuntos

Bases de Dados Genéticas , Eucariotos , Genômica/métodos , Interações Hospedeiro-Patógeno/genética , Metagenoma , Metagenômica/métodos , Software , Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Perfilação da Expressão Gênica , Proteômica , Navegador

8.

A new Plasmodium vivax reference sequence with improved assembly of the subtelomeres reveals an abundance of pir genes.

Auburn, Sarah; Böhme, Ulrike; Steinbiss, Sascha; Trimarsanto, Hidayat; Hostetler, Jessica; Sanders, Mandy; Gao, Qi; Nosten, Francois; Newbold, Chris I; Berriman, Matthew; Price, Ric N; Otto, Thomas D.

Wellcome Open Res ; 1: 4, 2016 Nov 15.

Artigo em Inglês | MEDLINE | ID: mdl-28008421

RESUMO

Plasmodium vivax is now the predominant cause of malaria in the Asia-Pacific, South America and Horn of Africa. Laboratory studies of this species are constrained by the inability to maintain the parasite in continuous ex vivo culture, but genomic approaches provide an alternative and complementary avenue to investigate the parasite's biology and epidemiology. To date, molecular studies of P. vivax have relied on the Salvador-I reference genome sequence, derived from a monkey-adapted strain from South America. However, the Salvador-I reference remains highly fragmented with over 2500 unassembled scaffolds. Using high-depth Illumina sequence data, we assembled and annotated a new reference sequence, PvP01, sourced directly from a patient from Papua Indonesia. Draft assemblies of isolates from China (PvC01) and Thailand (PvT01) were also prepared for comparative purposes. The quality of the PvP01 assembly is improved greatly over Salvador-I, with fragmentation reduced to 226 scaffolds. Detailed manual curation has ensured highly comprehensive annotation, with functions attributed to 58% core genes in PvP01 versus 38% in Salvador-I. The assemblies of PvP01, PvC01 and PvT01 are larger than that of Salvador-I (28-30 versus 27 Mb), owing to improved assembly of the subtelomeres. An extensive repertoire of over 1200 Plasmodium interspersed repeat (pir) genes were identified in PvP01 compared to 346 in Salvador-I, suggesting a vital role in parasite survival or development. The manually curated PvP01 reference and PvC01 and PvT01 draft assemblies are important new resources to study vivax malaria. PvP01 is maintained at GeneDB and ongoing curation will ensure continual improvements in assembly and annotation quality.

9.

An expressed, endogenous Nodavirus-like element captured by a retrotransposon in the genome of the plant parasitic nematode Bursaphelenchus xylophilus.

Cotton, James A; Steinbiss, Sascha; Yokoi, Toshiro; Tsai, Isheng J; Kikuchi, Taisei.

Sci Rep ; 6: 39749, 2016 12 22.

Artigo em Inglês | MEDLINE | ID: mdl-28004836

RESUMO

Recently, nematode viruses infecting Caenorhabditis elegans have been reported from the family Nodaviridae, the first nematode viruses described. Here, we report the observation of a novel endogenous viral element (EVE) in the genome of Bursaphelenchus xylophilus, a plant parasitic nematode unrelated to other nematodes from which viruses have been characterised. This element derives from a different clade of nodaviruses to the previously reported nematode viruses. This represents the first endogenous nodavirus sequence, the first nematode endogenous viral element, and significantly extends our knowledge of the potential diversity of the Nodaviridae. A search for endogenous elements related to the Nodaviridae did not reveal any elements in other available nematode genomes. Further surveillance for endogenous viral elements is warranted as our knowledge of nematode genome diversity, and in particular of free-living nematodes, expands.

Assuntos

Genoma Helmíntico , Nodaviridae , Retroelementos , Tylenchida/genética , Animais

10.

Companion: a web server for annotation and analysis of parasite genomes.

Steinbiss, Sascha; Silva-Franco, Fatima; Brunk, Brian; Foth, Bernardo; Hertz-Fowler, Christiane; Berriman, Matthew; Otto, Thomas D.

Nucleic Acids Res ; 44(W1): W29-34, 2016 07 08.

Artigo em Inglês | MEDLINE | ID: mdl-27105845

RESUMO

Currently available sequencing technologies enable quick and economical sequencing of many new eukaryotic parasite (apicomplexan or kinetoplastid) species or strains. Compared to SNP calling approaches, de novo assembly of these genomes enables researchers to additionally determine insertion, deletion and recombination events as well as to detect complex sequence diversity, such as that seen in variable multigene families. However, there currently are no automated eukaryotic annotation pipelines offering the required range of results to facilitate such analyses. A suitable pipeline needs to perform evidence-supported gene finding as well as functional annotation and pseudogene detection up to the generation of output ready to be submitted to a public database. Moreover, no current tool includes quick yet informative comparative analyses and a first pass visualization of both annotation and analysis results. To overcome those needs we have developed the Companion web server (http://companion.sanger.ac.uk) providing parasite genome annotation as a service using a reference-based approach. We demonstrate the use and performance of Companion by annotating two Leishmania and Plasmodium genomes as typical parasite cases and evaluate the results compared to manually annotated references.

Assuntos

Genoma de Protozoário , Leishmania/genética , Plasmodium falciparum/genética , Proteínas de Protozoários/genética , RNA de Protozoário/genética , Software , Bases de Dados Genéticas , Ontologia Genética , Internet , Leishmania/classificação , Anotação de Sequência Molecular , Filogenia , Plasmodium falciparum/classificação , Sensibilidade e Especificidade

11.

Community-driven development for computational biology at Sprints, Hackathons and Codefests.

Möller, Steffen; Afgan, Enis; Banck, Michael; Bonnal, Raoul J P; Booth, Timothy; Chilton, John; Cock, Peter J A; Gumbel, Markus; Harris, Nomi; Holland, Richard; Kalas, Matús; Kaján, László; Kibukawa, Eri; Powel, David R; Prins, Pjotr; Quinn, Jacqueline; Sallou, Olivier; Strozzi, Francesco; Seemann, Torsten; Sloggett, Clare; Soiland-Reyes, Stian; Spooner, William; Steinbiss, Sascha; Tille, Andreas; Travis, Anthony J; Guimera, Roman; Katayama, Toshiaki; Chapman, Brad A.

BMC Bioinformatics ; 15 Suppl 14: S7, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25472764

RESUMO

BACKGROUND: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. RESULTS: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. CONCLUSIONS: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects.

Assuntos

Biologia Computacional , Comportamento Cooperativo , Software , Comunicação , Internet

12.

The genome of the sparganosis tapeworm Spirometra erinaceieuropaei isolated from the biopsy of a migrating brain lesion.

Bennett, Hayley M; Mok, Hoi Ping; Gkrania-Klotsas, Effrossyni; Tsai, Isheng J; Stanley, Eleanor J; Antoun, Nagui M; Coghlan, Avril; Harsha, Bhavana; Traini, Alessandra; Ribeiro, Diogo M; Steinbiss, Sascha; Lucas, Sebastian B; Allinson, Kieren S J; Price, Stephen J; Santarius, Thomas S; Carmichael, Andrew J; Chiodini, Peter L; Holroyd, Nancy; Dean, Andrew F; Berriman, Matthew.

Genome Biol ; 15(11): 510, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25413302

RESUMO

BACKGROUND: Sparganosis is an infection with a larval Diphyllobothriidea tapeworm. From a rare cerebral case presented at a clinic in the UK, DNA was recovered from a biopsy sample and used to determine the causative species as Spirometra erinaceieuropaei through sequencing of the cox1 gene. From the same DNA, we have produced a draft genome, the first of its kind for this species, and used it to perform a comparative genomics analysis and to investigate known and potential tapeworm drug targets in this tapeworm. RESULTS: The 1.26 Gb draft genome of S. erinaceieuropaei is currently the largest reported for any flatworm. Through investigation of ß-tubulin genes, we predict that S. erinaceieuropaei larvae are insensitive to the tapeworm drug albendazole. We find that many putative tapeworm drug targets are also present in S. erinaceieuropaei, allowing possible cross application of new drugs. In comparison to other sequenced tapeworm species we observe expansion of protease classes, and of Kuntiz-type protease inhibitors. Expanded gene families in this tapeworm also include those that are involved in processes that add post-translational diversity to the protein landscape, intracellular transport, transcriptional regulation and detoxification. CONCLUSIONS: The S. erinaceieuropaei genome begins to give us insight into an order of tapeworms previously uncharacterized at the genome-wide level. From a single clinical case we have begun to sketch a picture of the characteristics of these organisms. Finally, our work represents a significant technological achievement as we present a draft genome sequence of a rare tapeworm, and from a small amount of starting material.

Assuntos

Diphyllobothrium/genética , Genoma , Esparganose/genética , Spirometra/genética , Animais , Sequência de Bases , Biópsia , Encéfalo/parasitologia , Encéfalo/patologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Esparganose/parasitologia , Spirometra/parasitologia , Reino Unido

13.

GenomeTools: a comprehensive software library for efficient processing of structured genome annotations.

Gremme, Gordon; Steinbiss, Sascha; Kurtz, Stefan.

IEEE/ACM Trans Comput Biol Bioinform ; 10(3): 645-56, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24091398

RESUMO

Genome annotations are often published as plain text files describing genomic features and their subcomponents by an implicit annotation graph. In this paper, we present the GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs. The GenomeTools strictly follow the annotation graph approach, offering a unified graph-based representation. This gives the developer intuitive and immediate access to genomic features and tools for their manipulation. To process large annotation sets with low memory overhead, we have designed and implemented an efficient pull-based approach for sequential processing of annotations. This allows to handle even the largest annotation sets, such as a complete catalogue of human variations. Our object-oriented C-based software library enables a developer to conveniently implement their own functionality on annotation graphs and to integrate it into larger workflows, simultaneously accessing compressed sequence data if required. The careful C implementation of the GenomeTools does not only ensure a light-weight memory footprint while allowing full sequential as well as random access to the annotation graph, but also facilitates the creation of bindings to a variety of script programming languages (like Python and Ruby) sharing the same interface.

Assuntos

Genômica/métodos , Anotação de Sequência Molecular/métodos , Software , Genoma Humano , Humanos

14.

LTRsift: a graphical user interface for semi-automatic classification and postprocessing of de novo detected LTR retrotransposons.

Steinbiss, Sascha; Kastens, Sascha; Kurtz, Stefan.

Mob DNA ; 3(1): 18, 2012 Nov 07.

Artigo em Inglês | MEDLINE | ID: mdl-23131050

RESUMO

BACKGROUND: Long terminal repeat (LTR) retrotransposons are a class of eukaryotic mobile elements characterized by a distinctive sequence similarity-based structure. Hence they are well suited for computational identification. Current software allows for a comprehensive genome-wide de novo detection of such elements. The obvious next step is the classification of newly detected candidates resulting in (super-)families. Such a de novo classification approach based on sequence-based clustering of transposon features has been proposed before, resulting in a preliminary assignment of candidates to families as a basis for subsequent manual refinement. However, such a classification workflow is typically split across a heterogeneous set of glue scripts and generic software (for example, spreadsheets), making it tedious for a human expert to inspect, curate and export the putative families produced by the workflow. RESULTS: We have developed LTRsift, an interactive graphical software tool for semi-automatic postprocessing of de novo predicted LTR retrotransposon annotations. Its user-friendly interface offers customizable filtering and classification functionality, displaying the putative candidate groups, their members and their internal structure in a hierarchical fashion. To ease manual work, it also supports graphical user interface-driven reassignment, splitting and further annotation of candidates. Export of grouped candidate sets in standard formats is possible. In two case studies, we demonstrate how LTRsift can be employed in the context of a genome-wide LTR retrotransposon survey effort. CONCLUSIONS: LTRsift is a useful and convenient tool for semi-automated classification of newly detected LTR retrotransposons based on their internal features. Its efficient implementation allows for convenient and seamless filtering and classification in an integrated environment. Developed for life scientists, it is helpful in postprocessing and refining the output of software for predicting LTR retrotransposons up to the stage of preparing full-length reference sequence libraries. The LTRsift software is freely available at http://www.zbh.uni-hamburg.de/LTRsift under an open-source license.

15.

A new efficient data structure for storage and retrieval of multiple biosequences.

Steinbiss, Sascha; Kurtz, Stefan.

IEEE/ACM Trans Comput Biol Bioinform ; 9(2): 330-44, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22084150

RESUMO

Today's genome analysis applications require sequence representations allowing for fast access to their contents while also being memory-efficient enough to facilitate analyses of large-scale data. While a wide variety of sequence representations exist, lack of a generic implementation of efficient sequence storage has led to a plethora of poorly reusable or programming language-specific implementations. We present a novel, space-efficient data structure (GtEncseq) for storing multiple biological sequences of variable alphabet size, with customizable character transformations, wildcard support and an assortment of internal representations optimized for different distributions of wildcards and sequence lengths. For the human genome (3.1 gigabases, including 237 million wildcard characters) our representation requires only 2 + 8 × 10^-6bits per character. Implemented in C, our portable software implementation provides a variety of methods for random and sequential access to characters and substrings (including different reading directions) using an object-oriented interface. In addition, it includes access to metadata like sequence descriptions or character distributions. The library is extensible to be used from various scripting languages. GtEncseq is much more versatile than previous solutions, adding features that were previously unavailable. Benchmarks show that it is competitive with respect to space and time requirements.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência , Algoritmos , Modelos Genéticos , Família Multigênica

16.

FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context.

Mader, Malte; Simon, Ronald; Steinbiss, Sascha; Kurtz, Stefan.

J Clin Bioinforma ; 1(1): 20, 2011 Jul 28.

Artigo em Inglês | MEDLINE | ID: mdl-21884636

RESUMO

BACKGROUND: The rapidly growing amount of array CGH data requires improved visualization software supporting the process of identifying candidate cancer genes. Optimally, such software should work across multiple microarray platforms, should be able to cope with data from different sources and should be easy to operate. RESULTS: We have developed a web-based software FISH Oracle to visualize data from multiple array CGH experiments in a genomic context. Its fast visualization engine and advanced web and database technology supports highly interactive use. FISH Oracle comes with a convenient data import mechanism, powerful search options for genomic elements (e.g. gene names or karyobands), quick navigation and zooming into interesting regions, and mechanisms to export the visualization into different high quality formats. These features make the software especially suitable for the needs of life scientists. CONCLUSIONS: FISH Oracle offers a fast and easy to use visualization tool for array CGH and SNP array data. It allows for the identification of genomic regions representing minimal common changes based on data from one or more experiments. FISH Oracle will be instrumental to identify candidate onco and tumor suppressor genes based on the frequency and genomic position of DNA copy number changes. The FISH Oracle application and an installed demo web server are available at http://www.zbh.uni-hamburg.de/fishoracle.

17.

Fine-grained annotation and classification of de novo predicted LTR retrotransposons.

Steinbiss, Sascha; Willhoeft, Ute; Gremme, Gordon; Kurtz, Stefan.

Nucleic Acids Res ; 37(21): 7002-13, 2009 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-19786494

RESUMO

Long terminal repeat (LTR) retrotransposons and endogenous retroviruses (ERVs) are transposable elements in eukaryotic genomes well suited for computational identification. De novo identification tools determine the position of potential LTR retrotransposon or ERV insertions in genomic sequences. For further analysis, it is desirable to obtain an annotation of the internal structure of such candidates. This article presents LTRdigest, a novel software tool for automated annotation of internal features of putative LTR retrotransposons. It uses local alignment and hidden Markov model-based algorithms to detect retrotransposon-associated protein domains as well as primer binding sites and polypurine tracts. As an example, we used LTRdigest results to identify 88 (near) full-length ERVs in the chromosome 4 sequence of Mus musculus, separating them from truncated insertions and other repeats. Furthermore, we propose a work flow for the use of LTRdigest in de novo LTR retrotransposon classification and perform an exemplary de novo analysis on the Drosophila melanogaster genome as a proof of concept. Using a new method solely based on the annotations generated by LTRdigest, 518 potential LTR retrotransposons were automatically assigned to 62 candidate groups. Representative sequences from 41 of these 62 groups were matched to reference sequences with >80% global sequence similarity.

Assuntos

Retroelementos , Software , Sequências Repetidas Terminais , Animais , Cromossomos de Mamíferos , Classificação/métodos , Drosophila melanogaster/genética , Retrovirus Endógenos/genética , Genoma de Inseto , Genômica , Camundongos

18.

AnnotationSketch: a genome annotation drawing library.

Steinbiss, Sascha; Gremme, Gordon; Schärfer, Christin; Mader, Malte; Kurtz, Stefan.

Bioinformatics ; 25(4): 533-4, 2009 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-19106120

RESUMO

SUMMARY: To analyse the vast amount of genome annotation data available today, a visual representation of genomic features in a given sequence range is required. We developed a C library which provides layout and drawing capabilities for annotation features. It supports several common input and output formats and can easily be integrated into custom C applications. To exemplify the use of AnnotationSketch in other languages, we provide bindings to the scripting languages Ruby, Python and Lua. AVAILABILITY: The software is available under an open-source license as part of GenomeTools (http://genometools.org/annotationsketch.html).

Assuntos

Genoma , Software , Gráficos por Computador , Bases de Dados Factuais , Perfilação da Expressão Gênica/métodos , Linguagens de Programação , Interface Usuário-Computador

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA