Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Cell ; 187(18): 4926-4945.e22, 2024 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-38986619

RESUMEN

Posterior fossa group A (PFA) ependymoma is a lethal brain cancer diagnosed in infants and young children. The lack of driver events in the PFA linear genome led us to search its 3D genome for characteristic features. Here, we reconstructed 3D genomes from diverse childhood tumor types and uncovered a global topology in PFA that is highly reminiscent of stem and progenitor cells in a variety of human tissues. A remarkable feature exclusively present in PFA are type B ultra long-range interactions in PFAs (TULIPs), regions separated by great distances along the linear genome that interact with each other in the 3D nuclear space with surprising strength. TULIPs occur in all PFA samples and recur at predictable genomic coordinates, and their formation is induced by expression of EZHIP. The universality of TULIPs across PFA samples suggests a conservation of molecular principles that could be exploited therapeutically.


Asunto(s)
Ependimoma , Ependimoma/genética , Humanos , Neoplasias Infratentoriales/genética , Neoplasias Infratentoriales/patología , Genoma Humano , Lactante , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/patología , Niño , Masculino , Femenino
2.
Cell ; 183(6): 1617-1633.e22, 2020 12 10.
Artículo en Inglés | MEDLINE | ID: mdl-33259802

RESUMEN

Histone H3.3 glycine 34 to arginine/valine (G34R/V) mutations drive deadly gliomas and show exquisite regional and temporal specificity, suggesting a developmental context permissive to their effects. Here we show that 50% of G34R/V tumors (n = 95) bear activating PDGFRA mutations that display strong selection pressure at recurrence. Although considered gliomas, G34R/V tumors actually arise in GSX2/DLX-expressing interneuron progenitors, where G34R/V mutations impair neuronal differentiation. The lineage of origin may facilitate PDGFRA co-option through a chromatin loop connecting PDGFRA to GSX2 regulatory elements, promoting PDGFRA overexpression and mutation. At the single-cell level, G34R/V tumors harbor dual neuronal/astroglial identity and lack oligodendroglial programs, actively repressed by GSX2/DLX-mediated cell fate specification. G34R/V may become dispensable for tumor maintenance, whereas mutant-PDGFRA is potently oncogenic. Collectively, our results open novel research avenues in deadly tumors. G34R/V gliomas are neuronal malignancies where interneuron progenitors are stalled in differentiation by G34R/V mutations and malignant gliogenesis is promoted by co-option of a potentially targetable pathway, PDGFRA signaling.


Asunto(s)
Neoplasias Encefálicas/genética , Carcinogénesis/genética , Glioma/genética , Histonas/genética , Interneuronas/metabolismo , Mutación/genética , Células-Madre Neurales/metabolismo , Receptor alfa de Factor de Crecimiento Derivado de Plaquetas/genética , Animales , Astrocitos/metabolismo , Astrocitos/patología , Neoplasias Encefálicas/patología , Carcinogénesis/patología , Linaje de la Célula , Reprogramación Celular/genética , Cromatina/metabolismo , Embrión de Mamíferos/metabolismo , Epigénesis Genética , Regulación Neoplásica de la Expresión Génica , Silenciador del Gen , Glioma/patología , Histonas/metabolismo , Lisina/metabolismo , Ratones Endogámicos C57BL , Modelos Biológicos , Clasificación del Tumor , Oligodendroglía/metabolismo , Regiones Promotoras Genéticas/genética , Prosencéfalo/embriología , Receptor alfa de Factor de Crecimiento Derivado de Plaquetas/metabolismo , Transcripción Genética , Transcriptoma/genética
3.
Mol Cell ; 80(4): 726-735.e7, 2020 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-33049227

RESUMEN

Diffuse midline gliomas and posterior fossa type A ependymomas contain the recurrent histone H3 lysine 27 (H3 K27M) mutation and express the H3 K27M-mimic EZHIP (CXorf67), respectively. H3 K27M and EZHIP are competitive inhibitors of Polycomb Repressive Complex 2 (PRC2) lysine methyltransferase activity. In vivo, these proteins reduce overall H3 lysine 27 trimethylation (H3K27me3) levels; however, residual peaks of H3K27me3 remain at CpG islands (CGIs) through an unknown mechanism. Here, we report that EZHIP and H3 K27M preferentially interact with PRC2 that is allosterically activated by H3K27me3 at CGIs and impede its spreading. Moreover, H3 K27M oncohistones reduce H3K27me3 in trans, independent of their incorporation into the chromatin. Although EZHIP is not found outside placental mammals, expression of human EZHIP reduces H3K27me3 in Drosophila melanogaster through a conserved mechanism. Our results provide mechanistic insights for the retention of residual H3K27me3 in tumors driven by H3 K27M and EZHIP.


Asunto(s)
Cromatina/genética , Metilación de ADN , Regulación Neoplásica de la Expresión Génica , Histonas/genética , Mutación , Proteínas Oncogénicas/metabolismo , Complejo Represivo Polycomb 2/metabolismo , Regulación Alostérica , Animales , Islas de CpG , Drosophila melanogaster , Humanos , Ratones , Proteínas Oncogénicas/genética , Complejo Represivo Polycomb 2/genética
4.
Am J Hum Genet ; 2024 Oct 22.
Artículo en Inglés | MEDLINE | ID: mdl-39481374

RESUMEN

Four main medulloblastoma (MB) molecular subtypes have been identified based on transcriptional, DNA methylation, and genetic profiles. However, it is currently not known whether 3D genome architecture differs between MB subtypes. To address this question, we performed in situ Hi-C to reconstruct the 3D genome architecture of MB subtypes. In total, we generated Hi-C and matching transcriptome data for 28 surgical specimens and Hi-C data for one patient-derived xenograft. The average resolution of the Hi-C maps was 6,833 bp. Using these data, we found that insulation scores of topologically associating domains (TADs) were effective at distinguishing MB molecular subgroups. TAD insulation score differences between subtypes were globally not associated with differential gene expression, although we identified few exceptions near genes expressed in the lineages of origin of specific MB subtypes. Our study therefore supports the notion that TAD insulation scores can distinguish MB subtypes independently of their transcriptional differences.

5.
Proc Natl Acad Sci U S A ; 117(44): 27354-27364, 2020 11 03.
Artículo en Inglés | MEDLINE | ID: mdl-33067396

RESUMEN

A high percentage of pediatric gliomas and bone tumors reportedly harbor missense mutations at glycine 34 in genes encoding histone variant H3.3. We find that these H3.3 G34 mutations directly alter the enhancer chromatin landscape of mesenchymal stem cells by impeding methylation at lysine 36 on histone H3 (H3K36) by SETD2, but not by the NSD1/2 enzymes. The reduction of H3K36 methylation by G34 mutations promotes an aberrant gain of PRC2-mediated H3K27me2/3 and loss of H3K27ac at active enhancers containing SETD2 activity. This altered histone modification profile promotes a unique gene expression profile that supports enhanced tumor development in vivo. Our findings are mirrored in G34W-containing giant cell tumors of bone where patient-derived stromal cells exhibit gene expression profiles associated with early osteoblastic differentiation. Overall, we demonstrate that H3.3 G34 oncohistones selectively promote PRC2 activity by interfering with SETD2-mediated H3K36 methylation. We propose that PRC2-mediated silencing of enhancers involved in cell differentiation represents a potential mechanism by which H3.3 G34 mutations drive these tumors.


Asunto(s)
Histonas/genética , Complejo Represivo Polycomb 2/metabolismo , Cromatina/genética , Cromatina/metabolismo , Expresión Génica/genética , Regulación de la Expresión Génica/genética , Glioma/patología , Células HEK293 , N-Metiltransferasa de Histona-Lisina/metabolismo , N-Metiltransferasa de Histona-Lisina/fisiología , Histonas/metabolismo , Humanos , Lisina/metabolismo , Células Madre Mesenquimatosas/metabolismo , Metilación , Mutación/genética , Procesos Neoplásicos , Complejo Represivo Polycomb 1/genética , Complejo Represivo Polycomb 1/metabolismo , Complejo Represivo Polycomb 2/genética , Procesamiento Proteico-Postraduccional
6.
BMC Cancer ; 22(1): 1297, 2022 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-36503484

RESUMEN

BACKGROUND: Juvenile Pilocytic Astrocytomas (JPAs) are one of the most common pediatric brain tumors, and they are driven by aberrant activation of the mitogen-activated protein kinase (MAPK) signaling pathway. RAF-fusions are the most common genetic alterations identified in JPAs, with the prototypical KIAA1549-BRAF fusion leading to loss of BRAF's auto-inhibitory domain and subsequent constitutive kinase activation. JPAs are highly vascular and show pervasive immune infiltration, which can lead to low tumor cell purity in clinical samples. This can result in gene fusions that are difficult to detect with conventional omics approaches including RNA-Seq. METHODS: To this effect, we applied RNA-Seq as well as linked-read whole-genome sequencing and in situ Hi-C as new approaches to detect and characterize low-frequency gene fusions at the genomic, transcriptomic and spatial level. RESULTS: Integration of these datasets allowed the identification and detailed characterization of two novel BRAF fusion partners, PTPRZ1 and TOP2B, in addition to the canonical fusion with partner KIAA1549. Additionally, our Hi-C datasets enabled investigations of 3D genome architecture in JPAs which showed a high level of correlation in 3D compartment annotations between JPAs compared to other pediatric tumors, and high similarity to normal adult astrocytes. We detected interactions between BRAF and its fusion partners exclusively in tumor samples containing BRAF fusions. CONCLUSIONS: We demonstrate the power of integrating multi-omic datasets to identify low frequency fusions and characterize the JPA genome at high resolution. We suggest that linked-reads and Hi-C could be used in clinic for the detection and characterization of JPAs.


Asunto(s)
Astrocitoma , Neoplasias Encefálicas , Niño , Adulto , Humanos , Multiómica , Proteínas Proto-Oncogénicas B-raf/genética , Proteínas de Fusión Oncogénica/genética , Astrocitoma/patología , Neoplasias Encefálicas/patología , Proteínas Tirosina Fosfatasas Clase 5 Similares a Receptores
7.
Genome Res ; 25(12): 1921-33, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26377836

RESUMEN

We describe a genome reference of the African green monkey or vervet (Chlorocebus aethiops). This member of the Old World monkey (OWM) superfamily is uniquely valuable for genetic investigations of simian immunodeficiency virus (SIV), for which it is the most abundant natural host species, and of a wide range of health-related phenotypes assessed in Caribbean vervets (C. a. sabaeus), whose numbers have expanded dramatically since Europeans introduced small numbers of their ancestors from West Africa during the colonial era. We use the reference to characterize the genomic relationship between vervets and other primates, the intra-generic phylogeny of vervet subspecies, and genome-wide structural variations of a pedigreed C. a. sabaeus population. Through comparative analyses with human and rhesus macaque, we characterize at high resolution the unique chromosomal fission events that differentiate the vervets and their close relatives from most other catarrhine primates, in whom karyotype is highly conserved. We also provide a summary of transposable elements and contrast these with the rhesus macaque and human. Analysis of sequenced genomes representing each of the main vervet subspecies supports previously hypothesized relationships between these populations, which range across most of sub-Saharan Africa, while uncovering high levels of genetic diversity within each. Sequence-based analyses of major histocompatibility complex (MHC) polymorphisms reveal extremely low diversity in Caribbean C. a. sabaeus vervets, compared to vervets from putatively ancestral West African regions. In the C. a. sabaeus research population, we discover the first structural variations that are, in some cases, predicted to have a deleterious effect; future studies will determine the phenotypic impact of these variations.


Asunto(s)
Chlorocebus aethiops/genética , Genoma , Genómica , Animales , Chlorocebus aethiops/clasificación , Pintura Cromosómica , Biología Computacional/métodos , Evolución Molecular , Reordenamiento Génico , Variación Genética , Genómica/métodos , Cariotipo , Complejo Mayor de Histocompatibilidad/genética , Anotación de Secuencia Molecular , Filogenia , Filogeografía
8.
BMC Biol ; 13: 41, 2015 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-26092298

RESUMEN

BACKGROUND: We report here the first genome-wide high-resolution polymorphism resource for non-human primate (NHP) association and linkage studies, constructed for the Caribbean-origin vervet monkey, or African green monkey (Chlorocebus aethiops sabaeus), one of the most widely used NHPs in biomedical research. We generated this resource by whole genome sequencing (WGS) of monkeys from the Vervet Research Colony (VRC), an NIH-supported research resource for which extensive phenotypic data are available. RESULTS: We identified genome-wide single nucleotide polymorphisms (SNPs) by WGS of 721 members of an extended pedigree from the VRC. From high-depth WGS data we identified more than 4 million polymorphic unequivocal segregating sites; by pruning these SNPs based on heterozygosity, quality control filters, and the degree of linkage disequilibrium (LD) between SNPs, we constructed genome-wide panels suitable for genetic association (about 500,000 SNPs) and linkage analysis (about 150,000 SNPs). To further enhance the utility of these resources for linkage analysis, we used a further pruned subset of the linkage panel to generate multipoint identity by descent matrices. CONCLUSIONS: The genetic and phenotypic resources now available for the VRC and other Caribbean-origin vervets enable their use for genetic investigation of traits relevant to human diseases.


Asunto(s)
Chlorocebus aethiops/genética , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Animales , Mapeo Cromosómico , Femenino , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Masculino , Repeticiones de Microsatélite , Fenotipo , Sitios de Carácter Cuantitativo , Análisis de Secuencia
9.
PLoS Genet ; 8(9): e1002931, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22969437

RESUMEN

The benefits of ever-growing numbers of sequenced eukaryotic genomes will not be fully realized until we learn to decipher vast stretches of noncoding DNA, largely composed of transposable elements. Transposable elements persist through self-replication, but some genes once encoded by transposable elements have, through a process called molecular domestication, evolved new functions that increase fitness. Although they have conferred numerous adaptations, the number of such domesticated transposable element genes remains unknown, so their evolutionary and functional impact cannot be fully assessed. Systematic searches that exploit genomic signatures of natural selection have been employed to identify potential domesticated genes, but their predictions have yet to be experimentally verified. To this end, we investigated a family of domesticated genes called MUSTANG (MUG), identified in a previous bioinformatic search of plant genomes. We show that MUG genes are functional. Mutants of Arabidopsis thaliana MUG genes yield phenotypes with severely reduced plant fitness through decreased plant size, delayed flowering, abnormal development of floral organs, and markedly reduced fertility. MUG genes are present in all flowering plants, but not in any non-flowering plant lineages, such as gymnosperms, suggesting that the molecular domestication of MUG may have been an integral part of early angiosperm evolution. This study shows that systematic searches can be successful at identifying functional genetic elements in noncoding regions and demonstrates how to combine systematic searches with reverse genetics in a fruitful way to decipher eukaryotic genomes.


Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Elementos Transponibles de ADN , Arabidopsis/fisiología , Evolución Biológica , Magnoliopsida/genética , Mutación , Filogenia , Reproducción
10.
Nat Commun ; 15(1): 7769, 2024 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-39237515

RESUMEN

Histone H3-mutant gliomas are deadly brain tumors characterized by a dysregulated epigenome and stalled differentiation. In contrast to the extensive datasets available on tumor cells, limited information exists on their tumor microenvironment (TME), particularly the immune infiltrate. Here, we characterize the immune TME of H3.3K27M and G34R/V-mutant gliomas, and multiple H3.3K27M mouse models, using transcriptomic, proteomic and spatial single-cell approaches. Resolution of immune lineages indicates high infiltration of H3-mutant gliomas with diverse myeloid populations, high-level expression of immune checkpoint markers, and scarce lymphoid cells, findings uniformly reproduced in all H3.3K27M mouse models tested. We show these myeloid populations communicate with H3-mutant cells, mediating immunosuppression and sustaining tumor formation and maintenance. Dual inhibition of myeloid cells and immune checkpoint pathways show significant therapeutic benefits in pre-clinical syngeneic mouse models. Our findings provide a valuable characterization of the TME of oncohistone-mutant gliomas, and insight into the means for modulating the myeloid infiltrate for the benefit of patients.


Asunto(s)
Neoplasias Encefálicas , Glioma , Histonas , Mutación , Células Mieloides , Microambiente Tumoral , Animales , Glioma/genética , Glioma/inmunología , Glioma/patología , Microambiente Tumoral/inmunología , Microambiente Tumoral/genética , Células Mieloides/metabolismo , Células Mieloides/inmunología , Histonas/metabolismo , Histonas/genética , Ratones , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/inmunología , Neoplasias Encefálicas/patología , Humanos , Línea Celular Tumoral , Modelos Animales de Enfermedad , Ratones Endogámicos C57BL , Regulación Neoplásica de la Expresión Génica , Análisis de la Célula Individual
11.
Genome Biol Evol ; 2022 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-35668612

RESUMEN

Insects have developed various adaptations to survive harsh winter conditions. Among freeze-intolerant species, some produce "antifreeze proteins" (AFPs) that bind to nascent ice crystals and inhibit further ice growth. Such is the case of the spruce budworm, Choristoneura fumiferana (Lepidoptera: Tortricidae), a destructive North American conifer pest that can withstand temperatures below -30°C. Despite the potential importance of AFPs in the adaptive diversification of Choristoneura, genomic tools to explore their origins have until now been limited. Here we present a chromosome-scale genome assembly for C. fumiferana, which we used to conduct comparative genomic analyses aimed at reconstructing the evolutionary history of tortricid AFPs. The budworm genome features 16 genes homologous to previously reported C. fumiferana AFPs (CfAFPs), 15 of which map to a single region on chromosome 18. Fourteen of these were also detected in five congeneric species, indicating Choristoneura AFP diversification occurred before the speciation event that led to C. fumiferana. Although budworm AFPs were previously considered unique to the genus Choristoneura, a search for homologs targeting recently sequenced tortricid genomes identified seven CfAFP-like genes in the distantly related Notocelia uddmanniana. High structural similarity between Notocelia and Choristoneura AFPs suggests a common origin, despite the absence of homologs in three related tortricids. Interestingly, one Notocelia AFP formed the C-terminus of a "zonadhesin-like" protein, possibly representing the ancestral condition from which tortricid AFPs evolved. Future work should clarify the evolutionary path of AFPs between Notocelia and Choristoneura and assess the role of the "zonadhesin-like" protein as precursor of tortricid AFPs.

12.
Nat Commun ; 10(1): 2146, 2019 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-31086175

RESUMEN

Posterior fossa type A (PFA) ependymomas exhibit very low H3K27 methylation and express high levels of EZHIP (Enhancer of Zeste Homologs Inhibitory Protein, also termed CXORF67). Here we find that a conserved sequence in EZHIP is necessary and sufficient to inhibit PRC2 catalytic activity in vitro and in vivo. EZHIP directly contacts the active site of the EZH2 subunit in a mechanism similar to the H3 K27M oncohistone. Furthermore, expression of H3 K27M or EZHIP in cells promotes similar chromatin profiles: loss of broad H3K27me3 domains, but retention of H3K27me3 at CpG islands. We find that H3K27me3-mediated allosteric activation of PRC2 substantially increases the inhibition potential of EZHIP and H3 K27M, providing a mechanism to explain the observed loss of H3K27me3 spreading in tumors. Our data indicate that PFA ependymoma and DIPG are driven in part by the action of peptidyl PRC2 inhibitors, the K27M oncohistone and the EZHIP 'oncohistone-mimic', that dysregulate gene silencing to promote tumorigenesis.


Asunto(s)
Neoplasias Encefálicas/genética , Ependimoma/genética , Glioma/genética , Proteínas Oncogénicas/metabolismo , Complejo Represivo Polycomb 2/metabolismo , Animales , Neoplasias Encefálicas/patología , Carcinogénesis/genética , Línea Celular Tumoral , Cromatina/metabolismo , Islas de CpG , Fosa Craneal Posterior , Conjuntos de Datos como Asunto , Embrión de Mamíferos , Ependimoma/patología , Fibroblastos , Regulación Neoplásica de la Expresión Génica , Silenciador del Gen , Glioma/patología , Células HEK293 , Histonas , Humanos , Ratones , Proteínas Oncogénicas/genética , Cultivo Primario de Células , Proteínas Recombinantes/genética , Proteínas Recombinantes/aislamiento & purificación , Proteínas Recombinantes/metabolismo
13.
Nat Commun ; 7: 12131, 2016 07 06.
Artículo en Inglés | MEDLINE | ID: mdl-27381634

RESUMEN

African green monkeys (AGMs) are natural primate hosts of simian immunodeficiency virus (SIV). Interestingly, features of the envelope-specific antibody responses in SIV-infected AGMs are distinct from that of HIV-infected humans and SIV-infected rhesus monkeys, including gp120-focused responses and rapid development of autologous neutralization. Yet, the lack of genetic tools to evaluate B-cell lineages hinders potential use of this unique non-human primate model for HIV vaccine development. Here we define features of the AGM Ig loci and compare the proportion of Env-specific memory B-cell populations to that of HIV-infected humans and SIV-infected rhesus monkeys. AGMs appear to have a higher proportion of Env-specific memory B cells that are mainly gp120 directed. Furthermore, AGM gp120-specific monoclonal antibodies display robust antibody-dependent cellular cytotoxicity and CD4-dependent virion capture activity. Our results support the use of AGMs to model induction of functional gp120-specific antibodies by HIV vaccine strategies.


Asunto(s)
Anticuerpos Neutralizantes/biosíntesis , Anticuerpos Antivirales/biosíntesis , Linfocitos B/inmunología , Inmunoglobulinas/biosíntesis , Síndrome de Inmunodeficiencia Adquirida del Simio/inmunología , Virus de la Inmunodeficiencia de los Simios/inmunología , Animales , Anticuerpos Neutralizantes/química , Anticuerpos Antivirales/química , Linfocitos B/virología , Linfocitos T CD4-Positivos/inmunología , Linfocitos T CD4-Positivos/virología , Chlorocebus aethiops , Enfermedad Crónica , Citotoxicidad Inmunológica , Variación Genética , Proteína gp120 de Envoltorio del VIH/inmunología , Humanos , Inmunidad Celular , Inmunoglobulinas/clasificación , Memoria Inmunológica , Macaca mulatta , Síndrome de Inmunodeficiencia Adquirida del Simio/virología , Virus de la Inmunodeficiencia de los Simios/patogenicidad , Virión/inmunología , Virión/patogenicidad
14.
Genome Res ; 15(9): 1292-7, 2005 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-16140995

RESUMEN

DNA transposons are known to frequently capture duplicated fragments of host genes. The evolutionary impact of this phenomenon depends on how frequently the fragments retain protein-coding function as opposed to becoming pseudogenes. Gene fragment duplication by Mutator-like elements (MULEs) has previously been documented in maize, Arabidopsis, and rice. Here we present a rigorous genome-wide analysis of MULEs in the model plant Oryza sativa (domesticated rice). We identify 8274 MULEs with intact termini and target-site duplications (TSDs) and show that 1337 of them contain duplicated host gene fragments. Through a detailed examination of the 5% of duplicated gene fragments that are transcribed, we demonstrate that virtually all cases contain pseudogenic features such as fragmented conserved protein domains, frameshifts, and premature stop codons. In addition, we show that the distribution of the ratio of nonsynonymous to synonymous amino acid substitution rates for the duplications agrees with the expected distribution for pseudogenes. We conclude that MULE-mediated host gene duplication results in the formation of pseudogenes, not novel functional protein-coding genes; however, the transcribed duplications possess characteristics consistent with a potential role in the regulation of host gene expression.


Asunto(s)
Duplicación de Gen , Genes de Plantas , Oryza/genética , Elementos Transponibles de ADN/genética , ADN Complementario/genética , ADN de Plantas/genética , Evolución Molecular , Modelos Genéticos
15.
Bioinformatics ; 20(2): 155-60, 2004 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-14734305

RESUMEN

MOTIVATION: The high content of repetitive sequences in the genomes of many higher eukaryotes renders the task of annotating them computationally intensive. Presently, the only widely accepted method of searching and annotating transposable elements (TEs) in large genomic sequences is the use of the RepeatMasker program, which identifies new copies of TEs by pairwise sequence comparisons with a library of known TEs. Profile hidden Markov models (HMMs) have been used successfully in discovering distant homologs of known proteins in large protein databases, but this approach has only rarely been applied to known model TE families in genomic DNA. RESULTS: We used a combination of computational approaches to annotate the TEs in the finished genome of Oryza sativa ssp. japonica. In this paper, we discuss the strengths and the weaknesses of the annotation methods used. These approaches included: the default configuration of RepeatMasker using cross_match, an implementation of the Smith-Waterman-Gotoh algorithm; RepeatMasker using WU-BLAST for similarity searching; and the HMMER package, used to search for TEs with profile HMMs. All the results were converted into GFF format and post-processed using a set of Perl scripts. RepeatMasker was used in the case of most TE families. The WU-BLAST implementation of RepeatMasker was found to be manifold faster than cross_match with only a slight loss in sensitivity and was thus used to obtain the final set of data. HMMER was used in the annotation of the Mutator-like element (MULE) superfamily and the miniature inverted-repeat transposable element (MITE) polyphyletic group of families, for which large libraries of elements were available and which could be divided into well-defined families. The HMMER search algorithm was extremely slow for models over 1000 bp in length, so MULE families with members over 1000 bp long were processed with RepeatMasker instead. The main disadvantage of HMMER in this application is that, since it was developed with protein sequences in mind, it does not search the negative DNA strand. With the exception of TE families with essentially palindromic sequences, reverse complement models had to be created and run to compensate for this shortcoming. We conclude that a modification of RepeatMasker to incorporate libraries of profile HMMs in searches could improve the ability to detect degenerated copies of TEs. AVAILABILITY: The Perl scripts and TE sequences used in construction of the RepeatMasker library and the profile HMMs are available upon request.


Asunto(s)
Algoritmos , Elementos Transponibles de ADN/genética , Documentación , Perfilación de la Expresión Génica/métodos , Genoma de Planta , Oryza/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Bases de Datos de Ácidos Nucleicos , Modelos Genéticos , Modelos Estadísticos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA