Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Mol Biol Evol ; 40(5)2023 05 02.
Artículo en Inglés | MEDLINE | ID: mdl-37071810

RESUMEN

Horizontal transfer of transposable elements (TEs) is an important mechanism contributing to genetic diversity and innovation. Bats (order Chiroptera) have repeatedly been shown to experience horizontal transfer of TEs at what appears to be a high rate compared with other mammals. We investigated the occurrence of horizontally transferred (HT) DNA transposons involving bats. We found over 200 putative HT elements within bats; 16 transposons were shared across distantly related mammalian clades, and 2 other elements were shared with a fish and two lizard species. Our results indicate that bats are a hotspot for horizontal transfer of DNA transposons. These events broadly coincide with the diversification of several bat clades, supporting the hypothesis that DNA transposon invasions have contributed to genetic diversification of bats.


Asunto(s)
Quirópteros , Elementos Transponibles de ADN , Animales , Elementos Transponibles de ADN/genética , Quirópteros/genética , Transferencia de Gen Horizontal , Evolución Molecular , Mamíferos/genética , Filogenia
2.
Science ; 380(6643): eabn1430, 2023 04 28.
Artículo en Inglés | MEDLINE | ID: mdl-37104570

RESUMEN

We examined transposable element (TE) content of 248 placental mammal genome assemblies, the largest de novo TE curation effort in eukaryotes to date. We found that although mammals resemble one another in total TE content and diversity, they show substantial differences with regard to recent TE accumulation. This includes multiple recent expansion and quiescence events across the mammalian tree. Young TEs, particularly long interspersed elements, drive increases in genome size, whereas DNA transposons are associated with smaller genomes. Mammals tend to accumulate only a few types of TEs at any given time, with one TE type dominating. We also found association between dietary habit and the presence of DNA transposon invasions. These detailed annotations will serve as a benchmark for future comparative TE analyses among placental mammals.


Asunto(s)
Elementos Transponibles de ADN , Euterios , Evolución Molecular , Variación Genética , Animales , Femenino , Embarazo , Elementos de Nucleótido Esparcido Largo , Euterios/genética , Conjuntos de Datos como Asunto , Conducta Alimentaria
3.
NAR Genom Bioinform ; 4(2): lqac040, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35591887

RESUMEN

The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family.

4.
Genes (Basel) ; 13(4)2022 04 17.
Artículo en Inglés | MEDLINE | ID: mdl-35456515

RESUMEN

The discovery and characterization of transposable element (TE) families are crucial tasks in the process of genome annotation. Careful curation of TE libraries for each organism is necessary as each has been exposed to a unique and often complex set of TE families. De novo methods have been developed; however, a fully automated and accurate approach to the development of complete libraries remains elusive. In this review, we cover established methods and recent developments in de novo TE analysis. We also present various methodologies used to assess these tools and discuss opportunities for further advancement of the field.


Asunto(s)
Elementos Transponibles de ADN , Elementos Transponibles de ADN/genética
5.
Science ; 376(6588): eabk3112, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35357925

RESUMEN

Mobile elements and repetitive genomic regions are sources of lineage-specific genomic innovation and uniquely fingerprint individual genomes. Comprehensive analyses of such repeat elements, including those found in more complex regions of the genome, require a complete, linear genome assembly. We present a de novo repeat discovery and annotation of the T2T-CHM13 human reference genome. We identified previously unknown satellite arrays, expanded the catalog of variants and families for repeats and mobile elements, characterized classes of complex composite repeats, and located retroelement transduction events. We detected nascent transcription and delineated CpG methylation profiles to define the structure of transcriptionally active retroelements in humans, including those in centromeres. These data expand our insight into the diversity, distribution, and evolution of repetitive regions that have shaped the human genome.


Asunto(s)
Epigénesis Genética , Genoma Humano , Secuencias Repetitivas de Ácidos Nucleicos , Telómero/genética , Transcripción Genética , Humanos
6.
BMC Biol ; 19(1): 241, 2021 11 09.
Artículo en Inglés | MEDLINE | ID: mdl-34749730

RESUMEN

BACKGROUND: The rice weevil Sitophilus oryzae is one of the most important agricultural pests, causing extensive damage to cereal in fields and to stored grains. S. oryzae has an intracellular symbiotic relationship (endosymbiosis) with the Gram-negative bacterium Sodalis pierantonius and is a valuable model to decipher host-symbiont molecular interactions. RESULTS: We sequenced the Sitophilus oryzae genome using a combination of short and long reads to produce the best assembly for a Curculionidae species to date. We show that S. oryzae has undergone successive bursts of transposable element (TE) amplification, representing 72% of the genome. In addition, we show that many TE families are transcriptionally active, and changes in their expression are associated with insect endosymbiotic state. S. oryzae has undergone a high gene expansion rate, when compared to other beetles. Reconstruction of host-symbiont metabolic networks revealed that, despite its recent association with cereal weevils (30 kyear), S. pierantonius relies on the host for several amino acids and nucleotides to survive and to produce vitamins and essential amino acids required for insect development and cuticle biosynthesis. CONCLUSIONS: Here we present the genome of an agricultural pest beetle, which may act as a foundation for pest control. In addition, S. oryzae may be a useful model for endosymbiosis, and studying TE evolution and regulation, along with the impact of TEs on eukaryotic genomes.


Asunto(s)
Escarabajos , Gorgojos , Animales , Comunicación Celular , Elementos Transponibles de ADN/genética , Grano Comestible , Humanos , Gorgojos/genética
7.
Curr Protoc ; 1(6): e154, 2021 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-34138525

RESUMEN

Transposable elements (TEs) have the ability to alter individual genomic landscapes and shape the course of evolution for species in which they reside. Such profound changes can be understood by studying the biology of the organism and the interplay of the TEs it hosts. Characterizing and curating TEs across a wide range of species is a fundamental first step in this endeavor. This protocol employs techniques honed while developing TE libraries for a wide range of organisms and specifically addresses: (1) the extension of truncated de novo results into full-length TE families; (2) the iterative refinement of TE multiple sequence alignments; and (3) the use of alignment visualization to assess model completeness and subfamily structure. © 2021 Wiley Periodicals LLC. Basic Protocol: Extension and edge polishing of consensi and seed alignments derived from de novo repeat finders Support Protocol: Generating seed alignments using a library of consensi and a genome assembly.


Asunto(s)
Elementos Transponibles de ADN , Genómica , Elementos Transponibles de ADN/genética , Humanos , Alineación de Secuencia
8.
Genome Res ; 26(5): 649-59, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-26916108

RESUMEN

We identified a novel repeat family, termed Platy-1, in the Callithrix jacchus (common marmoset) genome that arose around the time of the divergence of platyrrhines and catarrhines and established itself as a repeat family in New World monkeys (NWMs). A full-length Platy-1 element is ∼100 bp in length, making it the shortest known short interspersed element (SINE) in primates, and harbors features characteristic of non-LTR retrotransposons. We identified 2268 full-length Platy-1 elements across 62 subfamilies in the common marmoset genome. Our subfamily reconstruction and phylogenetic analyses support Platy-1 propagation throughout the evolution of NWMs in the lineage leading to C. jacchus Platy-1 appears to have reached its amplification peak in the common ancestor of current day marmosets and has since moderately declined. However, identification of more than 200 Platy-1 elements identical to their respective consensus sequence, and the presence of polymorphic elements within common marmoset populations, suggests ongoing retrotransposition activity. Platy-1, a SINE, appears to have originated from an Alu element, and hence is likely derived from 7SL RNA. Our analyses illustrate the birth of a new repeat family and its propagation dynamics in the lineage leading to the common marmoset over the last 40 million years.


Asunto(s)
Elementos Alu , Callithrix/genética , Evolución Molecular , Filogenia , Retroelementos , Animales
9.
Nucleic Acids Res ; 44(D1): D81-9, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26612867

RESUMEN

Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download.


Asunto(s)
Elementos Transponibles de ADN , ADN/química , Bases de Datos de Ácidos Nucleicos , Secuencias Repetitivas de Ácidos Nucleicos , Animales , ADN/clasificación , Genoma , Humanos , Internet , Cadenas de Markov , Ratones , Anotación de Secuencia Molecular , Alineación de Secuencia
10.
Nucleic Acids Res ; 43(Database issue): D670-81, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25428374

RESUMEN

Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genómica , Animales , Cricetinae , Perros , Ebolavirus/genética , Expresión Génica , Genoma , Internet , Ratones , Anotación de Secuencia Molecular , Fenotipo , Ratas , Programas Informáticos
11.
Retrovirology ; 11: 71, 2014 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-25499090

RESUMEN

BACKGROUND: Crocodilians are thought to be hosts to a diverse and divergent complement of endogenous retroviruses (ERVs) but a comprehensive investigation is yet to be performed. The recent sequencing of three crocodilian genomes provides an opportunity for a more detailed and accurate representation of the ERV diversity that is present in these species. Here we investigate the diversity, distribution and evolution of ERVs from the genomes of three key crocodilian species, and outline the key processes driving crocodilian ERV proliferation and evolution. RESULTS: ERVs and ERV related sequences make up less than 2% of crocodilian genomes. We recovered and described 45 ERV groups within the three crocodilian genomes, many of which are species specific. We have also revealed a new class of ERV, ERV4, which appears to be common to crocodilians and turtles, and currently has no characterised exogenous counterpart. For the first time, we formally describe the characteristics of this ERV class and its classification relative to other recognised ERV and retroviral classes. This class shares some sequence similarity and sequence characteristics with ERV3, although it is phylogenetically distinct from the other ERV classes. We have also identified two instances of gene capture by crocodilian ERVs, one of which, the capture of a host KIT-ligand mRNA has occurred without the loss of an ERV domain. CONCLUSIONS: This study indicates that crocodilian ERVs comprise a wide variety of lineages, many of which appear to reflect ancient infections. In particular, ERV4 appears to have a limited host range, with current data suggesting that it is confined to crocodilians and some lineages of turtles. Also of interest are two ERV groups that demonstrate evidence of host gene capture. This study provides a framework to facilitate further studies into non-mammalian vertebrates and highlights the need for further studies into such species.


Asunto(s)
Caimanes y Cocodrilos/genética , Caimanes y Cocodrilos/virología , Retrovirus Endógenos/clasificación , Retrovirus Endógenos/genética , Evolución Molecular , Variación Genética , Genoma , Animales , Análisis por Conglomerados , Biología Computacional , Filogenia , Recombinación Genética , Homología de Secuencia , Tortugas/virología
12.
Nucleic Acids Res ; 42(12): e99, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24803667

RESUMEN

A common practice in computational genomic analysis is to use a set of 'background' sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such 'background' sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by 'shuffling' real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/.


Asunto(s)
ADN Intergénico/química , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Animales , Gatos , Bovinos , Perros , Genes , Cobayas , Humanos , Intrones , Ratones , Modelos Estadísticos , Conejos , Ratas , Secuencias Repetitivas de Ácidos Nucleicos
13.
Nat Methods ; 11(6): 689-94, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24727652

RESUMEN

Genomic information is encoded on a wide range of distance scales, ranging from tens of bases to megabases. We developed a multiscale framework to analyze and visualize the information content of genomic signals. Different types of signals, such as G+C content or DNA methylation, are characterized by distinct patterns of signal enrichment or depletion across scales spanning several orders of magnitude. These patterns are associated with a variety of genomic annotations. By integrating the information across all scales, we demonstrated improved prediction of gene expression from polymerase II chromatin immunoprecipitation sequencing (ChIP-seq) measurements, and we observed that gene expression differences in colorectal cancer are related to methylation patterns that extend beyond the single-gene scale. Our software is available at https://github.com/tknijnen/msr/.


Asunto(s)
Genómica/métodos , Programas Informáticos , Transcriptoma , Animales , ADN/química , Metilación de ADN , Humanos , Análisis de Secuencia de ADN
14.
Nucleic Acids Res ; 41(Database issue): D70-82, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23203985

RESUMEN

We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps.


Asunto(s)
Elementos Transponibles de ADN , Bases de Datos de Ácidos Nucleicos , Genoma Humano , Humanos , Internet , Cadenas de Markov , Modelos Estadísticos , Anotación de Secuencia Molecular
15.
Am J Hum Genet ; 89(3): 382-97, 2011 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-21855840

RESUMEN

Assignment of alleles to haplotypes for nearly all the variants on all chromosomes can be performed by genetic analysis of a nuclear family with three or more children. Whole-genome sequence data enable deterministic phasing of nearly all sequenced alleles by permitting assignment of recombinations to precise chromosomal positions and specific meioses. We demonstrate this process of genetic phasing on two families each with four children. We generate haplotypes for all of the children and their parents; these haplotypes span all genotyped positions, including rare variants. Misassignments of phase between variants (switch errors) are nearly absent. Our algorithm can also produce multimegabase haplotypes for nuclear families with just two children and can handle families with missing individuals. We implement our algorithm in a suite of software scripts (Haploscribe). Haplotypes and family genome sequences will become increasingly important for personalized medicine and for fundamental biology.


Asunto(s)
Algoritmos , Cromosomas Humanos/genética , Variación Genética , Haplotipos/genética , Patrón de Herencia/genética , Modelos Genéticos , Programas Informáticos , Humanos , Mutación/genética , Linaje , Análisis de Secuencia de ADN/métodos
16.
Science ; 328(5978): 636-9, 2010 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-20220176

RESUMEN

We analyzed the whole-genome sequences of a family of four, consisting of two siblings and their parents. Family-based sequencing allowed us to delineate recombination sites precisely, identify 70% of the sequencing errors (resulting in > 99.999% accuracy), and identify very rare single-nucleotide polymorphisms. We also directly estimated a human intergeneration mutation rate of approximately 1.1 x 10(-8) per position per haploid genome. Both offspring in this family have two recessive disorders: Miller syndrome, for which the gene was concurrently identified, and primary ciliary dyskinesia, for which causative genes have been previously identified. Family-based genome analysis enabled us to narrow the candidate genes for both of these Mendelian disorders to only four. Our results demonstrate the value of complete genome sequencing in families.


Asunto(s)
Anomalías Múltiples/genética , Trastornos de la Motilidad Ciliar/genética , Genoma Humano , Patrón de Herencia , Núcleo Familiar , Análisis de Secuencia de ADN , Algoritmos , Alelos , Dineínas Axonemales/genética , Intercambio Genético , Dihidroorotato Deshidrogenasa , Femenino , Genes Dominantes , Genes Recesivos , Estudios de Asociación Genética , Humanos , Deformidades Congénitas de las Extremidades/genética , Masculino , Disostosis Mandibulofacial/genética , Mutación , Oxidorreductasas actuantes sobre Donantes de Grupo CH-CH/genética , Linaje , Polimorfismo de Nucleótido Simple , Síndrome
17.
Nature ; 453(7192): 175-83, 2008 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-18464734

RESUMEN

We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.


Asunto(s)
Evolución Molecular , Genoma/genética , Ornitorrinco/genética , Animales , Composición de Base , Dentición , Femenino , Impresión Genómica/genética , Humanos , Inmunidad/genética , Masculino , Mamíferos/genética , MicroARNs/genética , Proteínas de la Leche/genética , Filogenia , Ornitorrinco/inmunología , Ornitorrinco/fisiología , Receptores Odorantes/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Reptiles/genética , Análisis de Secuencia de ADN , Espermatozoides/metabolismo , Ponzoñas/genética , Zona Pelúcida/metabolismo
18.
Science ; 316(5822): 238-40, 2007 Apr 13.
Artículo en Inglés | MEDLINE | ID: mdl-17431169

RESUMEN

The completion of the draft sequence of the rhesus macaque genome allowed us to study the genomic composition and evolution of transposable elements in this representative of the Old World monkey lineage, a group of diverse primates closely related to humans. The L1 family of long interspersed elements appears to have evolved as a single lineage, and Alu elements have evolved into four currently active lineages. We also found evidence of elevated horizontal transmissions of retroviruses and the absence of DNA transposon activity in the Old World monkey lineage. In addition, approximately 100 precursors of composite SVA (short interspersed element, variable number of tandem repeat, and Alu) elements were identified, with the majority being shared by the common ancestor of humans and rhesus macaques. Mobile elements compose roughly 50% of primate genomes, and our findings illustrate their diversity and strong influence on genome evolution between closely related species.


Asunto(s)
Cercopithecidae/genética , Elementos Transponibles de ADN , Macaca mulatta/genética , Animales , Retrovirus Endógenos/genética , Evolución Molecular , Transferencia de Gen Horizontal , Genoma , Genoma Humano , Humanos , Secuencias Repetitivas de Ácidos Nucleicos , Retroelementos
19.
Genome Res ; 16(7): 864-74, 2006 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-16717141

RESUMEN

Recent comparative analyses of mammalian sequences have revealed that a large number of nonprotein-coding genomic regions are under strong selective constraint. Here, we report that some of these loci have been derived from a newly defined family of ancient SINEs (short interspersed repetitive elements). This is a surprising result, as SINEs and other transposable elements are commonly thought to be genomic parasites. We named the ancient SINE family AmnSINE1, for Amniota SINE1, because we found it to be present in mammals as well as in birds, and some copies predate the mammalian-bird split 310 million years ago (Mya). AmnSINE1 has a chimeric structure of a 5S rRNA and a tRNA-derived SINE, and is related to five tRNA-derived SINE families that we characterized here in the coelacanth, dogfish shark, hagfish, and amphioxus genomes. All of the newly described SINE families have a common central domain that is also shared by zebrafish SINE3, and we collectively name them the DeuSINE (Deuterostomia SINE) superfamily. Notably, of the approximately 1000 still identifiable copies of AmnSINE1 in the human genome, 105 correspond to loci phylogenetically highly conserved among mammalian orthologs. The conservation is strongest over the central domain. Thus, AmnSINE1 appears to be the best example of a transposable element of which a significant fraction of the copies have acquired genomic functionality.


Asunto(s)
Genoma , Mamíferos/genética , Elementos de Nucleótido Esparcido Corto/genética , Algoritmos , Animales , Emparejamiento Base , Secuencia de Bases , Secuencia de Consenso , Elementos Transponibles de ADN/genética , Genoma Humano , Humanos , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Filogenia , Regiones Promotoras Genéticas , Estructura Terciaria de Proteína , ARN/química , ARN Ribosómico 5S/genética , ARN de Transferencia/genética , Selección Genética , Homología de Secuencia de Ácido Nucleico , Factores de Tiempo
20.
PLoS Comput Biol ; 2(3): e18, 2006 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-16543943

RESUMEN

The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent "genomic deserts."


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Reparación del ADN , Evolución Molecular , Humanos , Modelos Estadísticos , Mutación , Programas Informáticos , Factores de Tiempo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...