Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
Int J Cancer ; 2024 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-38709956

RESUMEN

We analyzed variations in the epidermal growth factor receptor (EGFR) gene and 5'-upstream region to identify potential molecular predictors of treatment response in primary epithelial ovarian cancer. Tumor tissues collected during debulking surgery from the prospective multicenter OVCAD study were investigated. Copy number variations in the human endogenous retrovirus sequence human endogenous retrovirus K9 (HERVK9) and EGFR Exons 7 and 9, as well as repeat length and loss of heterozygosity of polymorphic CA-SSR I and relative EGFR mRNA expression were determined quantitatively. At least one EGFR variation was observed in 94% of the patients. Among the 30 combinations of variations discovered, enhanced platinum sensitivity (n = 151) was found dominantly with HERVK9 haploidy and Exon 7 tetraploidy, overrepresented among patients with survival ≥120 months (24/29, p = .0212). EGFR overexpression (≥80 percentile) was significantly less likely in the responders (17% vs. 32%, p = .044). Multivariate Cox regression analysis, including age, FIGO stage, and grade, indicated that the patients' subgroup was prognostically significant for CA-SSR I repeat length <18 CA for both alleles (HR 0.276, 95% confidence interval 0.109-0.655, p = .001). Although EGFR variations occur in ovarian cancer, the mRNA levels remain low compared to other EGFR-mutated cancers. Notably, the inherited length of the CA-SSR I repeat, HERVK9 haploidy, and Exon 7 tetraploidy conferred three times higher odds ratio to survive for more than 10 years under therapy. This may add value in guiding therapies if determined during follow-up in circulating tumor cells or circulating tumor DNA and offers HERVK9 as a potential therapeutic target.

2.
Microbiol Resour Announc ; 10(35): e0054521, 2021 Sep 02.
Artículo en Inglés | MEDLINE | ID: mdl-34472979

RESUMEN

We report the high-quality draft assemblies and gene annotations for 13 species and/or strains of the protozoan parasite genera Leishmania, Endotrypanum, and Crithidia, which span the phylogenetic diversity of the subfamily Leishmaniinae within the kinetoplastid order of the phylum Euglenazoa. These resources will support studies on the origins of parasitism.

3.
Genome Biol Evol ; 11(8): 2391-2402, 2019 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-31364699

RESUMEN

In Europe, Trichomonas gallinae recently emerged as a cause of epidemic disease in songbirds. A clonal strain of the parasite, first found in the United Kingdom, has become the predominant strain there and spread to continental Europe. Discriminating this epidemic strain of T. gallinae from other strains necessitated development of multilocus sequence typing (MLST). Development of the MLST was facilitated by the assembly and annotation of a 54.7 Mb draft genome of a cloned stabilate of the A1 European finch epidemic strain (isolated from Greenfinch, Chloris chloris, XT-1081/07 in 2007) containing 21,924 protein coding genes. This enabled construction of a robust 19 locus MLST based on existing typing loci for Trichomonas vaginalis and T. gallinae. Our MLST has the sensitivity to discriminate strains within existing genotypes confidently, and resolves the American finch A1 genotype from the European finch epidemic A1 genotype. Interestingly, one isolate we obtained from a captive black-naped fruit dove Ptilinopsus melanospilus, was not truly T. gallinae but a hybrid of T. gallinae with a distant trichomonad lineage. Phylogenetic analysis of the individual loci in this fruit dove provides evidence of gene flow between distant trichomonad lineages at 2 of the 19 loci examined and may provide precedence for the emergence of other hybrid trichomonad genomes including T. vaginalis.


Asunto(s)
Enfermedades de las Aves/parasitología , Evolución Molecular , Pinzones/parasitología , Genoma de Protozoos , Proteínas Protozoarias/genética , Tricomoniasis/veterinaria , Trichomonas/genética , Animales , Enfermedades de las Aves/epidemiología , ADN Protozoario/genética , Regulación de la Expresión Génica , Tipificación de Secuencias Multilocus , Filogenia , Transcriptoma , Trichomonas/aislamiento & purificación , Tricomoniasis/epidemiología , Tricomoniasis/parasitología
4.
Nature ; 563(7729): 121-125, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30333624

RESUMEN

Many evolutionarily distant pathogenic organisms have evolved similar survival strategies to evade the immune responses of their hosts. These include antigenic variation, through which an infecting organism prevents clearance by periodically altering the identity of proteins that are visible to the immune system of the host1. Antigenic variation requires large reservoirs of immunologically diverse antigen genes, which are often generated through homologous recombination, as well as mechanisms to ensure the expression of one or very few antigens at any given time. Both homologous recombination and gene expression are affected by three-dimensional genome architecture and local DNA accessibility2,3. Factors that link three-dimensional genome architecture, local chromatin conformation and antigenic variation have, to our knowledge, not yet been identified in any organism. One of the major obstacles to studying the role of genome architecture in antigenic variation has been the highly repetitive nature and heterozygosity of antigen-gene arrays, which has precluded complete genome assembly in many pathogens. Here we report the de novo haplotype-specific assembly and scaffolding of the long antigen-gene arrays of the model protozoan parasite Trypanosoma brucei, using long-read sequencing technology and conserved features of chromosome folding4. Genome-wide chromosome conformation capture (Hi-C) reveals a distinct partitioning of the genome, with antigen-encoding subtelomeric regions that are folded into distinct, highly compact compartments. In addition, we performed a range of analyses-Hi-C, fluorescence in situ hybridization, assays for transposase-accessible chromatin using sequencing and single-cell RNA sequencing-that showed that deletion of the histone variants H3.V and H4.V increases antigen-gene clustering, DNA accessibility across sites of antigen expression and switching of the expressed antigen isoform, via homologous recombination. Our analyses identify histone variants as a molecular link between global genome architecture, local chromatin conformation and antigenic variation.


Asunto(s)
Variación Antigénica/genética , Cromatina/genética , Cromatina/metabolismo , ADN Protozoario/metabolismo , Genoma/genética , Trypanosoma brucei brucei/genética , Trypanosoma brucei brucei/inmunología , ADN Protozoario/genética , Haplotipos/genética , Histonas/deficiencia , Histonas/genética , Familia de Multigenes/genética , Isoformas de Proteínas/biosíntesis , Isoformas de Proteínas/genética , Glicoproteínas Variantes de Superficie de Trypanosoma/biosíntesis , Glicoproteínas Variantes de Superficie de Trypanosoma/genética
5.
Genome Res ; 28(4): 547-560, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29500236

RESUMEN

Avian malaria parasites are prevalent around the world and infect a wide diversity of bird species. Here, we report the sequencing and analysis of high-quality draft genome sequences for two avian malaria species, Plasmodium relictum and Plasmodium gallinaceum We identify 50 genes that are specific to avian malaria, located in an otherwise conserved core of the genome that shares gene synteny with all other sequenced malaria genomes. Phylogenetic analysis suggests that the avian malaria species form an outgroup to the mammalian Plasmodium species, and using amino acid divergence between species, we estimate the avian- and mammalian-infective lineages diverged in the order of 10 million years ago. Consistent with their phylogenetic position, we identify orthologs of genes that had previously appeared to be restricted to the clades of parasites containing Plasmodium falciparum and Plasmodium vivax, the species with the greatest impact on human health. From these orthologs, we explore differential diversifying selection across the genus and show that the avian lineage is remarkable in the extent to which invasion-related genes are evolving. The subtelomeres of the P. relictum and P. gallinaceum genomes contain several novel gene families, including an expanded surf multigene family. We also identify an expansion of reticulocyte binding protein homologs in P. relictum, and within these proteins, we detect distinct regions that are specific to nonhuman primate, humans, rodent, and avian hosts. For the first time in the Plasmodium lineage, we find evidence of transposable elements, including several hundred fragments of LTR-retrotransposons in both species and an apparently complete LTR-retrotransposon in the genome of P. gallinaceum.


Asunto(s)
Malaria Aviar/genética , Plasmodium falciparum/genética , Plasmodium vivax/genética , Plasmodium/genética , Animales , Aves/parasitología , Evolución Molecular , Humanos , Malaria Aviar/parasitología , Mamíferos/parasitología , Filogenia , Plasmodium/patogenicidad , Plasmodium falciparum/patogenicidad , Plasmodium vivax/patogenicidad
6.
J Genomics ; 5: 1-3, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28138343

RESUMEN

Trypanosoma equiperdum is the causative agent of dourine, a sexually-transmitted infection of horses. This parasite belongs to the subgenus Trypanozoon that also includes the agent of sleeping sickness (Trypanosoma brucei) and surra (Trypanosoma evansi). We herein report the genome sequence of a T. equiperdum strain OVI, isolated from a horse in South-Africa in 1976. This is the first genome sequence of the T. equiperdum species, and its availability will provide important insights for future studies on genetic classification of the subgenus Trypanozoon.

7.
Nucleic Acids Res ; 45(D1): D581-D591, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27903906

RESUMEN

The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host-pathogen interactions.


Asunto(s)
Bases de Datos Genéticas , Eucariontes , Genómica/métodos , Interacciones Huésped-Patógeno/genética , Metagenoma , Metagenómica/métodos , Programas Informáticos , Biología Computacional/métodos , Variaciones en el Número de Copia de ADN , Perfilación de la Expresión Génica , Proteómica , Navegador Web
8.
Sci Rep ; 6: 39749, 2016 12 22.
Artículo en Inglés | MEDLINE | ID: mdl-28004836

RESUMEN

Recently, nematode viruses infecting Caenorhabditis elegans have been reported from the family Nodaviridae, the first nematode viruses described. Here, we report the observation of a novel endogenous viral element (EVE) in the genome of Bursaphelenchus xylophilus, a plant parasitic nematode unrelated to other nematodes from which viruses have been characterised. This element derives from a different clade of nodaviruses to the previously reported nematode viruses. This represents the first endogenous nodavirus sequence, the first nematode endogenous viral element, and significantly extends our knowledge of the potential diversity of the Nodaviridae. A search for endogenous elements related to the Nodaviridae did not reveal any elements in other available nematode genomes. Further surveillance for endogenous viral elements is warranted as our knowledge of nematode genome diversity, and in particular of free-living nematodes, expands.


Asunto(s)
Genoma de los Helmintos , Nodaviridae , Retroelementos , Tylenchida/genética , Animales
9.
Wellcome Open Res ; 1: 4, 2016 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-28008421

RESUMEN

Plasmodium vivax is now the predominant cause of malaria in the Asia-Pacific, South America and Horn of Africa. Laboratory studies of this species are constrained by the inability to maintain the parasite in continuous ex vivo culture, but genomic approaches provide an alternative and complementary avenue to investigate the parasite's biology and epidemiology. To date, molecular studies of P. vivax have relied on the Salvador-I reference genome sequence, derived from a monkey-adapted strain from South America. However, the Salvador-I reference remains highly fragmented with over 2500 unassembled scaffolds.  Using high-depth Illumina sequence data, we assembled and annotated a new reference sequence, PvP01, sourced directly from a patient from Papua Indonesia. Draft assemblies of isolates from China (PvC01) and Thailand (PvT01) were also prepared for comparative purposes. The quality of the PvP01 assembly is improved greatly over Salvador-I, with fragmentation reduced to 226 scaffolds. Detailed manual curation has ensured highly comprehensive annotation, with functions attributed to 58% core genes in PvP01 versus 38% in Salvador-I. The assemblies of PvP01, PvC01 and PvT01 are larger than that of Salvador-I (28-30 versus 27 Mb), owing to improved assembly of the subtelomeres.  An extensive repertoire of over 1200 Plasmodium interspersed repeat (pir) genes were identified in PvP01 compared to 346 in Salvador-I, suggesting a vital role in parasite survival or development. The manually curated PvP01 reference and PvC01 and PvT01 draft assemblies are important new resources to study vivax malaria. PvP01 is maintained at GeneDB and ongoing curation will ensure continual improvements in assembly and annotation quality.

10.
Nucleic Acids Res ; 44(W1): W29-34, 2016 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-27105845

RESUMEN

Currently available sequencing technologies enable quick and economical sequencing of many new eukaryotic parasite (apicomplexan or kinetoplastid) species or strains. Compared to SNP calling approaches, de novo assembly of these genomes enables researchers to additionally determine insertion, deletion and recombination events as well as to detect complex sequence diversity, such as that seen in variable multigene families. However, there currently are no automated eukaryotic annotation pipelines offering the required range of results to facilitate such analyses. A suitable pipeline needs to perform evidence-supported gene finding as well as functional annotation and pseudogene detection up to the generation of output ready to be submitted to a public database. Moreover, no current tool includes quick yet informative comparative analyses and a first pass visualization of both annotation and analysis results. To overcome those needs we have developed the Companion web server (http://companion.sanger.ac.uk) providing parasite genome annotation as a service using a reference-based approach. We demonstrate the use and performance of Companion by annotating two Leishmania and Plasmodium genomes as typical parasite cases and evaluate the results compared to manually annotated references.


Asunto(s)
Genoma de Protozoos , Leishmania/genética , Plasmodium falciparum/genética , Proteínas Protozoarias/genética , ARN Protozoario/genética , Programas Informáticos , Bases de Datos Genéticas , Ontología de Genes , Internet , Leishmania/clasificación , Anotación de Secuencia Molecular , Filogenia , Plasmodium falciparum/clasificación , Sensibilidad y Especificidad
11.
BMC Bioinformatics ; 15 Suppl 14: S7, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25472764

RESUMEN

BACKGROUND: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. RESULTS: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. CONCLUSIONS: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects.


Asunto(s)
Biología Computacional , Conducta Cooperativa , Programas Informáticos , Comunicación , Internet
12.
Genome Biol ; 15(11): 510, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25413302

RESUMEN

BACKGROUND: Sparganosis is an infection with a larval Diphyllobothriidea tapeworm. From a rare cerebral case presented at a clinic in the UK, DNA was recovered from a biopsy sample and used to determine the causative species as Spirometra erinaceieuropaei through sequencing of the cox1 gene. From the same DNA, we have produced a draft genome, the first of its kind for this species, and used it to perform a comparative genomics analysis and to investigate known and potential tapeworm drug targets in this tapeworm. RESULTS: The 1.26 Gb draft genome of S. erinaceieuropaei is currently the largest reported for any flatworm. Through investigation of ß-tubulin genes, we predict that S. erinaceieuropaei larvae are insensitive to the tapeworm drug albendazole. We find that many putative tapeworm drug targets are also present in S. erinaceieuropaei, allowing possible cross application of new drugs. In comparison to other sequenced tapeworm species we observe expansion of protease classes, and of Kuntiz-type protease inhibitors. Expanded gene families in this tapeworm also include those that are involved in processes that add post-translational diversity to the protein landscape, intracellular transport, transcriptional regulation and detoxification. CONCLUSIONS: The S. erinaceieuropaei genome begins to give us insight into an order of tapeworms previously uncharacterized at the genome-wide level. From a single clinical case we have begun to sketch a picture of the characteristics of these organisms. Finally, our work represents a significant technological achievement as we present a draft genome sequence of a rare tapeworm, and from a small amount of starting material.


Asunto(s)
Diphyllobothrium/genética , Genoma , Esparganosis/genética , Spirometra/genética , Animales , Secuencia de Bases , Biopsia , Encéfalo/parasitología , Encéfalo/patología , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Esparganosis/parasitología , Spirometra/parasitología , Reino Unido
13.
Artículo en Inglés | MEDLINE | ID: mdl-24091398

RESUMEN

Genome annotations are often published as plain text files describing genomic features and their subcomponents by an implicit annotation graph. In this paper, we present the GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs. The GenomeTools strictly follow the annotation graph approach, offering a unified graph-based representation. This gives the developer intuitive and immediate access to genomic features and tools for their manipulation. To process large annotation sets with low memory overhead, we have designed and implemented an efficient pull-based approach for sequential processing of annotations. This allows to handle even the largest annotation sets, such as a complete catalogue of human variations. Our object-oriented C-based software library enables a developer to conveniently implement their own functionality on annotation graphs and to integrate it into larger workflows, simultaneously accessing compressed sequence data if required. The careful C implementation of the GenomeTools does not only ensure a light-weight memory footprint while allowing full sequential as well as random access to the annotation graph, but also facilitates the creation of bindings to a variety of script programming languages (like Python and Ruby) sharing the same interface.


Asunto(s)
Genómica/métodos , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Genoma Humano , Humanos
14.
Mob DNA ; 3(1): 18, 2012 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-23131050

RESUMEN

BACKGROUND: Long terminal repeat (LTR) retrotransposons are a class of eukaryotic mobile elements characterized by a distinctive sequence similarity-based structure. Hence they are well suited for computational identification. Current software allows for a comprehensive genome-wide de novo detection of such elements. The obvious next step is the classification of newly detected candidates resulting in (super-)families. Such a de novo classification approach based on sequence-based clustering of transposon features has been proposed before, resulting in a preliminary assignment of candidates to families as a basis for subsequent manual refinement. However, such a classification workflow is typically split across a heterogeneous set of glue scripts and generic software (for example, spreadsheets), making it tedious for a human expert to inspect, curate and export the putative families produced by the workflow. RESULTS: We have developed LTRsift, an interactive graphical software tool for semi-automatic postprocessing of de novo predicted LTR retrotransposon annotations. Its user-friendly interface offers customizable filtering and classification functionality, displaying the putative candidate groups, their members and their internal structure in a hierarchical fashion. To ease manual work, it also supports graphical user interface-driven reassignment, splitting and further annotation of candidates. Export of grouped candidate sets in standard formats is possible. In two case studies, we demonstrate how LTRsift can be employed in the context of a genome-wide LTR retrotransposon survey effort. CONCLUSIONS: LTRsift is a useful and convenient tool for semi-automated classification of newly detected LTR retrotransposons based on their internal features. Its efficient implementation allows for convenient and seamless filtering and classification in an integrated environment. Developed for life scientists, it is helpful in postprocessing and refining the output of software for predicting LTR retrotransposons up to the stage of preparing full-length reference sequence libraries. The LTRsift software is freely available at http://www.zbh.uni-hamburg.de/LTRsift under an open-source license.

15.
Artículo en Inglés | MEDLINE | ID: mdl-22084150

RESUMEN

Today's genome analysis applications require sequence representations allowing for fast access to their contents while also being memory-efficient enough to facilitate analyses of large-scale data. While a wide variety of sequence representations exist, lack of a generic implementation of efficient sequence storage has led to a plethora of poorly reusable or programming language-specific implementations. We present a novel, space-efficient data structure (GtEncseq) for storing multiple biological sequences of variable alphabet size, with customizable character transformations, wildcard support and an assortment of internal representations optimized for different distributions of wildcards and sequence lengths. For the human genome (3.1 gigabases, including 237 million wildcard characters) our representation requires only 2 + 8 × 10^-6bits per character. Implemented in C, our portable software implementation provides a variety of methods for random and sequential access to characters and substrings (including different reading directions) using an object-oriented interface. In addition, it includes access to metadata like sequence descriptions or character distributions. The library is extensible to be used from various scripting languages. GtEncseq is much more versatile than previous solutions, adding features that were previously unavailable. Benchmarks show that it is competitive with respect to space and time requirements.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Almacenamiento y Recuperación de la Información/métodos , Análisis de Secuencia , Algoritmos , Modelos Genéticos , Familia de Multigenes
16.
J Clin Bioinforma ; 1(1): 20, 2011 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-21884636

RESUMEN

BACKGROUND: The rapidly growing amount of array CGH data requires improved visualization software supporting the process of identifying candidate cancer genes. Optimally, such software should work across multiple microarray platforms, should be able to cope with data from different sources and should be easy to operate. RESULTS: We have developed a web-based software FISH Oracle to visualize data from multiple array CGH experiments in a genomic context. Its fast visualization engine and advanced web and database technology supports highly interactive use. FISH Oracle comes with a convenient data import mechanism, powerful search options for genomic elements (e.g. gene names or karyobands), quick navigation and zooming into interesting regions, and mechanisms to export the visualization into different high quality formats. These features make the software especially suitable for the needs of life scientists. CONCLUSIONS: FISH Oracle offers a fast and easy to use visualization tool for array CGH and SNP array data. It allows for the identification of genomic regions representing minimal common changes based on data from one or more experiments. FISH Oracle will be instrumental to identify candidate onco and tumor suppressor genes based on the frequency and genomic position of DNA copy number changes. The FISH Oracle application and an installed demo web server are available at http://www.zbh.uni-hamburg.de/fishoracle.

17.
Nucleic Acids Res ; 37(21): 7002-13, 2009 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-19786494

RESUMEN

Long terminal repeat (LTR) retrotransposons and endogenous retroviruses (ERVs) are transposable elements in eukaryotic genomes well suited for computational identification. De novo identification tools determine the position of potential LTR retrotransposon or ERV insertions in genomic sequences. For further analysis, it is desirable to obtain an annotation of the internal structure of such candidates. This article presents LTRdigest, a novel software tool for automated annotation of internal features of putative LTR retrotransposons. It uses local alignment and hidden Markov model-based algorithms to detect retrotransposon-associated protein domains as well as primer binding sites and polypurine tracts. As an example, we used LTRdigest results to identify 88 (near) full-length ERVs in the chromosome 4 sequence of Mus musculus, separating them from truncated insertions and other repeats. Furthermore, we propose a work flow for the use of LTRdigest in de novo LTR retrotransposon classification and perform an exemplary de novo analysis on the Drosophila melanogaster genome as a proof of concept. Using a new method solely based on the annotations generated by LTRdigest, 518 potential LTR retrotransposons were automatically assigned to 62 candidate groups. Representative sequences from 41 of these 62 groups were matched to reference sequences with >80% global sequence similarity.


Asunto(s)
Retroelementos , Programas Informáticos , Secuencias Repetidas Terminales , Animales , Cromosomas de los Mamíferos , Clasificación/métodos , Drosophila melanogaster/genética , Retrovirus Endógenos/genética , Genoma de los Insectos , Genómica , Ratones
18.
Bioinformatics ; 25(4): 533-4, 2009 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-19106120

RESUMEN

SUMMARY: To analyse the vast amount of genome annotation data available today, a visual representation of genomic features in a given sequence range is required. We developed a C library which provides layout and drawing capabilities for annotation features. It supports several common input and output formats and can easily be integrated into custom C applications. To exemplify the use of AnnotationSketch in other languages, we provide bindings to the scripting languages Ruby, Python and Lua. AVAILABILITY: The software is available under an open-source license as part of GenomeTools (http://genometools.org/annotationsketch.html).


Asunto(s)
Genoma , Programas Informáticos , Gráficos por Computador , Bases de Datos Factuales , Perfilación de la Expresión Génica/métodos , Lenguajes de Programación , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA