Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Trends Genet ; 38(1): 12-21, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34340871

RESUMEN

Human specific endogenous retrovirus H (HERVH) is highly expressed in both naive and primed stem cells and is essential for pluripotency. Despite the proven relationship between HERVH expression and pluripotency, there is no single definitive model for the function of HERVH. Instead, several hypotheses of a regulatory function have been put forward including HERVH acting as enhancers, long noncoding RNAs (lncRNAs), and most recently as markers of topologically associating domain (TAD) boundaries. Recently several enhancer-associated lncRNAs have been characterized, which bind to Mediator and are necessary for promoter-enhancer folding interactions. We propose a synergistic model of HERVH function combining relevant findings and discuss the current limitations for its role in regulation, including the lack of evidence for a pluripotency-associated target gene.


Asunto(s)
Retrovirus Endógenos , ARN Largo no Codificante , Retrovirus Endógenos/metabolismo , Elementos de Facilitación Genéticos , Humanos , ARN Largo no Codificante/metabolismo , Células Madre/metabolismo
2.
Bioinformatics ; 40(6)2024 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-38870532

RESUMEN

MOTIVATION: Understanding the rules that govern enhancer-driven transcription remains a central unsolved problem in genomics. Now with multiple massively parallel enhancer perturbation assays published, there are enough data that we can utilize to learn to predict enhancer-promoter (EP) relationships in a data-driven manner. RESULTS: We applied machine learning to one of the largest enhancer perturbation studies integrated with transcription factor (TF) and histone modification ChIP-seq. The results uncovered a discrepancy in the prediction of genome-wide data compared to data from targeted experiments. Relative strength of contact was important for prediction, confirming the basic principle of EP regulation. Novel features such as the density of the enhancers/promoters in the genomic region was found to be important, highlighting our lack of understanding on how other elements in the region contribute to the regulation. Several TF peaks were identified that improved the prediction by identifying the negatives and reducing False Positives. In summary, integrating genomic assays with enhancer perturbation studies increased the accuracy of the model, and provided novel insights into the understanding of enhancer-driven transcription. AVAILABILITY AND IMPLEMENTATION: The trained models, data, and the source code are available at http://doi.org/10.5281/zenodo.11290386 and https://github.com/HanLabUNLV/sleps.


Asunto(s)
Elementos de Facilitación Genéticos , Regiones Promotoras Genéticas , Aprendizaje Automático Supervisado , Humanos , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Genómica/métodos , Secuenciación de Inmunoprecipitación de Cromatina/métodos
3.
BMC Bioinformatics ; 25(1): 181, 2024 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-38720247

RESUMEN

BACKGROUND: RNA sequencing combined with machine learning techniques has provided a modern approach to the molecular classification of cancer. Class predictors, reflecting the disease class, can be constructed for known tissue types using the gene expression measurements extracted from cancer patients. One challenge of current cancer predictors is that they often have suboptimal performance estimates when integrating molecular datasets generated from different labs. Often, the quality of the data is variable, procured differently, and contains unwanted noise hampering the ability of a predictive model to extract useful information. Data preprocessing methods can be applied in attempts to reduce these systematic variations and harmonize the datasets before they are used to build a machine learning model for resolving tissue of origins. RESULTS: We aimed to investigate the impact of data preprocessing steps-focusing on normalization, batch effect correction, and data scaling-through trial and comparison. Our goal was to improve the cross-study predictions of tissue of origin for common cancers on large-scale RNA-Seq datasets derived from thousands of patients and over a dozen tumor types. The results showed that the choice of data preprocessing operations affected the performance of the associated classifier models constructed for tissue of origin predictions in cancer. CONCLUSION: By using TCGA as a training set and applying data preprocessing methods, we demonstrated that batch effect correction improved performance measured by weighted F1-score in resolving tissue of origin against an independent GTEx test dataset. On the other hand, the use of data preprocessing operations worsened classification performance when the independent test dataset was aggregated from separate studies in ICGC and GEO. Therefore, based on our findings with these publicly available large-scale RNA-Seq datasets, the application of data preprocessing techniques to a machine learning pipeline is not always appropriate.


Asunto(s)
Aprendizaje Automático , Neoplasias , RNA-Seq , Humanos , RNA-Seq/métodos , Neoplasias/genética , Transcriptoma/genética , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodos , Biología Computacional/métodos
4.
Genomics ; 114(4): 110439, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35905834

RESUMEN

High-throughput assay systems have had a large impact on understanding the mechanisms of basic cell functions. However, high-throughput assays that directly assess molecular functions are limited. Herein, we describe the "GigaAssay", a modular high-throughput one-pot assay system for measuring molecular functions of thousands of genetic variants at once. In this system, each cell was infected with one virus from a library encoding thousands of Tat mutant proteins, with each viral particle encoding a random unique molecular identifier (UMI). We demonstrate proof of concept by measuring transcription of a GFP reporter in an engineered reporter cell line driven by binding of the HIV Tat transcription factor to the HIV long terminal repeat. Infected cells were flow-sorted into 3 bins based on their GFP fluorescence readout. The transcriptional activity of each Tat mutant was calculated from the ratio of signals from each bin. The use of UMIs in the GigaAssay produced a high average accuracy (95%) and positive predictive value (98%) determined by comparison to literature benchmark data, known C-terminal truncations, and blinded independent mutant tests. Including the substitution tolerance with structure/function analysis shows restricted substitution types spatially concentrated in the Cys-rich region. Tat has abundant intragenic epistasis (10%) when single and double mutants are compared.


Asunto(s)
VIH-1 , Productos del Gen tat del Virus de la Inmunodeficiencia Humana , Línea Celular , Duplicado del Terminal Largo de VIH , VIH-1/genética , Mutagénesis , Activación Transcripcional , Productos del Gen tat del Virus de la Inmunodeficiencia Humana/genética , Productos del Gen tat del Virus de la Inmunodeficiencia Humana/metabolismo
5.
Bioessays ; 41(12): e1900126, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31693213

RESUMEN

Genome editing with engineered nucleases (GEENs) introduce site-specific DNA double-strand breaks (DSBs) and repairs DSBs via nonhomologous end-joining (NHEJ) pathways that eventually create indels (insertions/deletions) in a genome. Whether the features of indels resulting from gene editing could be customized is asked. A review of the literature reveals how gene editing technologies via NHEJ pathways impact gene editing. The survey consolidates a body of literature that suggests that the type (insertion, deletion, and complex) and the approximate length of indel edits can be somewhat customized with different GEENs and by manipulating the expression of key NHEJ genes. Structural data suggest that binding of GEENs to DNA may interfere with binding of key components of DNA repair complexes, favoring either classical- or alternative-NHEJ. The hypotheses have some limitations, but if validated, will enable scientists to better control indel makeup, holding promise for basic science and clinical applications of gene editing. Also see the video abstract here https://youtu.be/vTkJtUsLi3w.


Asunto(s)
Edición Génica/métodos , Sistemas CRISPR-Cas/genética , ADN/genética , ADN/metabolismo , Roturas del ADN de Doble Cadena , Humanos , Nucleasas de los Efectores Tipo Activadores de la Transcripción/metabolismo , Nucleasas con Dedos de Zinc/metabolismo
6.
Calcif Tissue Int ; 107(4): 353-361, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-32728911

RESUMEN

The study aims were to develop fracture prediction models by using machine learning approaches and genomic data, as well as to identify the best modeling approach for fracture prediction. The genomic data of Osteoporotic Fractures in Men, cohort Study (n = 5130), were analyzed. After a comprehensive genotype imputation, genetic risk score (GRS) was calculated from 1103 associated Single Nucleotide Polymorphisms for each participant. Data were normalized and split into a training set (80%) and a validation set (20%) for analysis. Random forest, gradient boosting, neural network, and logistic regression were used to develop prediction models for major osteoporotic fractures separately, with GRS, bone density, and other risk factors as predictors. In model training, the synthetic minority oversampling technique was used to account for low fracture rate, and tenfold cross-validation was employed for hyperparameters optimization. In the testing, the area under curve (AUC) and accuracy were used to assess the model performance. The McNemar test was employed to examine the accuracy difference between models. The results showed that the prediction performance of gradient boosting was the best, with AUC of 0.71 and an accuracy of 0.88, and the GRS ranked as the 7th most important variable in the model. The performance of random forest and neural network were also significantly better than that of logistic regression. This study suggested that improving fracture prediction in older men can be achieved by incorporating genetic profiling and by utilizing the gradient boosting approach. This result should not be extrapolated to women or young individuals.


Asunto(s)
Densidad Ósea , Fracturas Óseas/diagnóstico , Aprendizaje Automático , Medición de Riesgo , Actividades Cotidianas , Anciano , Anciano de 80 o más Años , Estudios de Cohortes , Genómica , Humanos , Masculino , Fenotipo
7.
Mol Biol Evol ; 35(1): 50-65, 2018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-29309688

RESUMEN

Experimental evolution affords the opportunity to investigate adaptation to stressful environments. Studies combining experimental evolution with whole-genome resequencing have provided insight into the dynamics of adaptation and a new tool to uncover genes associated with polygenic traits. Here, we selected for starvation resistance in populations of Drosophila melanogaster for over 80 generations. In response, the starvation-selected lines developed an obese condition, storing nearly twice the level of total lipids than their unselected controls. Although these fats provide a ∼3-fold increase in starvation resistance, the imbalance in lipid homeostasis incurs evolutionary cost. Some of these tradeoffs resemble obesity-associated pathologies in mammals including metabolic depression, low activity levels, dilated cardiomyopathy, and disrupted sleeping patterns. To determine the genetic basis of these traits, we resequenced genomic DNA from the selected lines and their controls. We found 1,046,373 polymorphic sites, many of which diverged between selection treatments. In addition, we found a wide range of genetic heterogeneity between the replicates of the selected lines, suggesting multiple mechanisms of adaptation. Genome-wide heterozygosity was low in the selected populations, with many large blocks of SNPs nearing fixation. We found candidate loci under selection by using an algorithm to control for the effects of genetic drift. These loci were mapped to a set of 382 genes, which associated with many processes including nutrient response, catabolic metabolism, and lipid droplet function. The results of our study speak to the evolutionary origins of obesity and provide new targets to understand the polygenic nature of obesity in a unique model system.


Asunto(s)
Drosophila melanogaster/genética , Obesidad/genética , Inanición/genética , Aclimatación , Adaptación Fisiológica/genética , Animales , Evolución Molecular Dirigida/métodos , Modelos Animales de Enfermedad , Evolución Molecular , Genoma de los Insectos/genética , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Herencia Multifactorial , Selección Genética/genética
8.
J Mol Evol ; 83(3-4): 137-146, 2016 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-27770175

RESUMEN

Evolutionary constraint for insertions and deletions (indels) is not necessarily equal to constraint for nucleotide substitutions for any given region of a genome. Knowing the variation in indel-specific evolutionary rates across the sequence will aid our understanding of evolutionary constraints on indels, and help us infer how indels have contributed to the evolution of the sequence. However, unlike for nucleotide substitutions, there has been no phylogenetic method that can statistically infer significantly different rates of indels across the sequence space independent of substitution rates. Here, we have developed a software that will find sites with accelerated evolutionary rates specific to indels, by introducing a scaling parameter that only applies to the indel rates and not to the nucleotide substitution rates. Using the software, we show that we can find regions of accelerated rates of indels in the protein alignments of primate genomes. We also confirm that the sites that have high rates of indels are different from the sites that have high rates of nucleotide substitutions within the protein sequences. By identifying regions with accelerated rates of indels independent of nucleotide substitutions, we will be able to better understand the impact of indel mutations on protein sequence evolution.


Asunto(s)
Mutación INDEL , Modelos Genéticos , Tasa de Mutación , Animales , Simulación por Computador , Evolución Molecular , Humanos , Nucleótidos/genética , Filogenia , Proteínas/genética , Eliminación de Secuencia , Programas Informáticos , Especificidad de la Especie
9.
Mol Biol Evol ; 30(8): 1987-97, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23709260

RESUMEN

Current sequencing methods produce large amounts of data, but genome assemblies constructed from these data are often fragmented and incomplete. Incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. This means that methods attempting to estimate rates of gene duplication and loss often will be misled by such errors and that rates of gene family evolution will be consistently overestimated. Here, we present a method that takes these errors into account, allowing one to accurately infer rates of gene gain and loss among genomes even with low assembly and annotation quality. The method is implemented in the newest version of the software package CAFE, along with several other novel features. We demonstrate the accuracy of the method with extensive simulations and reanalyze several previously published data sets. Our results show that errors in genome annotation do lead to higher inferred rates of gene gain and loss but that CAFE 3 sufficiently accounts for these errors to provide accurate estimates of important evolutionary parameters.


Asunto(s)
Genoma , Anotación de Secuencia Molecular/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Biología Computacional/métodos , Evolución Molecular , Genómica/métodos , Reproducibilidad de los Resultados
10.
Nature ; 450(7167): 219-32, 2007 Nov 08.
Artículo en Inglés | MEDLINE | ID: mdl-17994088

RESUMEN

Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.


Asunto(s)
Drosophila/clasificación , Drosophila/genética , Evolución Molecular , Genoma de los Insectos/genética , Genómica , Animales , Secuencia de Bases , Sitios de Unión , Secuencia Conservada , Proteínas de Drosophila/genética , Exones/genética , Regulación de la Expresión Génica/genética , Genes de Insecto/genética , MicroARNs/genética , Datos de Secuencia Molecular , Especificidad de Órganos , Filogenia , Regiones no Traducidas/genética
11.
Data Brief ; 45: 108641, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-36426049

RESUMEN

The data in this article are associated with the research paper "GigaAssay - an adaptable high-throughput saturation mutagenesis assay" [1]. The raw data are sequence reads of HIV-1 Tat cDNA amplified from cellular genomic DNA in a new single-pot saturation mutagenesis assay designated the "GigaAssay". A bioinformatic pipeline and parameters used to analyze the data. Raw, processed, analyzed, and filtered data are reported. The data is processed to calculate the Tat-driven transcription activity for cells with each possible single amino acid substitution in Tat. This data can be reused to interpret Tat intermolecular interactions and HIV latency. This is one of the largest and most complete datasets regarding the impact of amino acid substitutions within a single protein on a molecular function.

12.
Sci Rep ; 11(1): 4482, 2021 02 24.
Artículo en Inglés | MEDLINE | ID: mdl-33627720

RESUMEN

The study aimed to utilize machine learning (ML) approaches and genomic data to develop a prediction model for bone mineral density (BMD) and identify the best modeling approach for BMD prediction. The genomic and phenotypic data of Osteoporotic Fractures in Men Study (n = 5130) was analyzed. Genetic risk score (GRS) was calculated from 1103 associated SNPs for each participant after a comprehensive genotype imputation. Data were normalized and divided into a training set (80%) and a validation set (20%) for analysis. Random forest, gradient boosting, neural network, and linear regression were used to develop BMD prediction models separately. Ten-fold cross-validation was used for hyper-parameters optimization. Mean square error and mean absolute error were used to assess model performance. When using GRS and phenotypic covariates as the predictors, all ML models' performance and linear regression in BMD prediction were similar. However, when replacing GRS with the 1103 individual SNPs in the model, ML models performed significantly better than linear regression (with lasso regularization), and the gradient boosting model performed the best. Our study suggested that ML models, especially gradient boosting, can improve BMD prediction in genomic data.


Asunto(s)
Densidad Ósea/genética , Densidad Ósea/fisiología , Anciano , Fracturas Óseas/genética , Fracturas Óseas/patología , Genómica/métodos , Genotipo , Humanos , Modelos Lineales , Aprendizaje Automático , Masculino , Polimorfismo de Nucleótido Simple/genética , Medición de Riesgo , Factores de Riesgo
13.
PLoS Genet ; 3(11): e197, 2007 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-17997610

RESUMEN

Comparison of whole genomes has revealed large and frequent changes in the size of gene families. These changes occur because of high rates of both gene gain (via duplication) and loss (via deletion or pseudogenization), as well as the evolution of entirely new genes. Here we use the genomes of 12 fully sequenced Drosophila species to study the gain and loss of genes at unprecedented resolution. We find large numbers of both gains and losses, with over 40% of all gene families differing in size among the Drosophila. Approximately 17 genes are estimated to be duplicated and fixed in a genome every million years, a rate on par with that previously found in both yeast and mammals. We find many instances of extreme expansions or contractions in the size of gene families, including the expansion of several sex- and spermatogenesis-related families in D. melanogaster that also evolve under positive selection at the nucleotide level. Newly evolved gene families in our dataset are associated with a class of testes-expressed genes known to have evolved de novo in a number of cases. Gene family comparisons also allow us to identify a number of annotated D. melanogaster genes that are unlikely to encode functional proteins, as well as to identify dozens of previously unannotated D. melanogaster genes with conserved homologs in the other Drosophila. Taken together, our results demonstrate that the apparent stasis in total gene number among species has masked rapid turnover in individual gene gain and loss. It is likely that this genomic revolving door has played a large role in shaping the morphological, physiological, and metabolic differences among species.


Asunto(s)
Drosophila/genética , Evolución Molecular , Genoma de los Insectos/genética , Familia de Multigenes/genética , Animales , Drosophila/clasificación , Genes de Insecto/genética , Funciones de Verosimilitud , Filogenia , Especificidad de la Especie
14.
BMC Bioinformatics ; 10: 356, 2009 Oct 27.
Artículo en Inglés | MEDLINE | ID: mdl-19860910

RESUMEN

BACKGROUND: Evolutionary trees are central to a wide range of biological studies. In many of these studies, tree nodes and branches need to be associated (or annotated) with various attributes. For example, in studies concerned with organismal relationships, tree nodes are associated with taxonomic names, whereas tree branches have lengths and oftentimes support values. Gene trees used in comparative genomics or phylogenomics are usually annotated with taxonomic information, genome-related data, such as gene names and functional annotations, as well as events such as gene duplications, speciations, or exon shufflings, combined with information related to the evolutionary tree itself. The data standards currently used for evolutionary trees have limited capacities to incorporate such annotations of different data types. RESULTS: We developed a XML language, named phyloXML, for describing evolutionary trees, as well as various associated data items. PhyloXML provides elements for commonly used items, such as branch lengths, support values, taxonomic names, and gene names and identifiers. By using "property" elements, phyloXML can be adapted to novel and unforeseen use cases. We also developed various software tools for reading, writing, conversion, and visualization of phyloXML formatted data. CONCLUSION: PhyloXML is an XML language defined by a complete schema in XSD that allows storing and exchanging the structures of evolutionary trees as well as associated data. More information about phyloXML itself, the XSD schema, as well as tools implementing and supporting phyloXML, is available at http://www.phyloxml.org.


Asunto(s)
Evolución Biológica , Biología Computacional/métodos , Genómica/métodos , Filogenia , Programas Informáticos , Bases de Datos Genéticas
15.
Mob DNA ; 10: 29, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31320939

RESUMEN

Though transposable elements make up around half of the human genome, the repetitive nature of their sequences makes it difficult to accurately align conventional sequencing reads. However, in light of new advances in sequencing technology, such as increased read length and paired-end libraries, these repetitive regions are now becoming easier to align to. This study investigates the mappability of transposable elements with 50 bp, 76 bp and 100 bp paired-end read libraries. With respect to those read lengths and allowing for 3 mismatches during alignment, over 68, 85, and 88% of all transposable elements in the RepeatMasker database are uniquely mappable, suggesting that accurate locus-specific mapping of older transposable elements is well within reach.

16.
Mob DNA ; 10: 39, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31497073

RESUMEN

BACKGROUND: Despite the long-held assumption that transposons are normally only expressed in the germ-line, recent evidence shows that transcripts of transposable element (TE) sequences are frequently found in the somatic cells. However, the extent of variation in TE transcript levels across different tissues and different individuals are unknown, and the co-expression between TEs and host gene mRNAs have not been examined. RESULTS: Here we report the variation in TE derived transcript levels across tissues and between individuals observed in the non-tumorous tissues collected for The Cancer Genome Atlas. We found core TE co-expression modules consisting mainly of transposons, showing correlated expression across broad classes of TEs. Despite this co-expression within tissues, there are individual TE loci that exhibit tissue-specific expression patterns, when compared across tissues. The core TE modules were negatively correlated with other gene modules that consisted of immune response genes in interferon signaling. KRAB Zinc Finger Proteins (KZFPs) were over-represented gene members of the TE modules, showing positive correlation across multiple tissues. But we did not find overlap between TE-KZFP pairs that are co-expressed and TE-KZFP pairs that are bound in published ChIP-seq studies. CONCLUSIONS: We find unexpected variation in TE derived transcripts, within and across non-tumorous tissues. We describe a broad view of the RNA state for non-tumorous tissues exhibiting higher level of TE transcripts. Tissues with higher level of TE transcripts have a broad range of TEs co-expressed, with high expression of a large number of KZFPs, and lower RNA levels of immune genes.

18.
Science ; 347(6217): 1258522, 2015 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-25554792

RESUMEN

Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts.


Asunto(s)
Anopheles/genética , Evolución Molecular , Genoma de los Insectos , Insectos Vectores/genética , Malaria/transmisión , Animales , Anopheles/clasificación , Secuencia de Bases , Cromosomas de Insectos/genética , Drosophila/genética , Humanos , Insectos Vectores/clasificación , Datos de Secuencia Molecular , Filogenia , Alineación de Secuencia
19.
Fly (Austin) ; 6(2): 121-5, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22634624

RESUMEN

Genes occasionally change their location in the genome through inter-chromosomal duplication and loss. These changes happen as mistakes during recombination or through retrotransposition. In Han and Hahn 2011,(1) we surveyed the genomes of ten Drosophila species, to identify and characterize the gene transposition events in the history of these species. In the paper, we showed that the rate of gene transposition in Drosophila is higher than previously appreciated. To understand the process of gene transposition, we examined the sequences, locations, and functions of the transposed genes. Based on the elevated rate of sequence evolution in transposed genes and the frequent movements near the centromeres and telomeres, we could not reject the hypothesis that these are mutations fixed through relaxed selection. But, by examining the functions of transposed genes more carefully, we found that genes with male-specific functions and genes with female-specific functions move in opposite directions involving the X chromosome. We also found an over-representation of chromosome related functions among the transposed genes. These observations suggest the possibility of particular selection pressures contributing to gene transpositions in Drosophila.


Asunto(s)
Cromosomas de Insectos , Drosophila/genética , Reordenamiento Génico , Genes de Insecto , Animales , Femenino , Masculino
20.
Genetics ; 190(2): 813-25, 2012 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-22095076

RESUMEN

Gene transposition puts a new gene copy in a novel genomic environment. Moreover, genes moving between the autosomes and the X chromosome experience change in several evolutionary parameters. Previous studies of gene transposition have not utilized the phylogenetic framework that becomes possible with the availability of whole genomes from multiple species. Here we used parsimonious reconstruction on the genomic distribution of gene families to analyze interchromosomal gene transposition in Drosophila. We identified 782 genes that have moved chromosomes within the phylogeny of 10 Drosophila species, including 87 gene families with multiple independent movements on different branches of the phylogeny. Using this large catalog of transposed genes, we detected accelerated sequence evolution in duplicated genes that transposed when compared to the parental copy at the original locus. We also observed a more refined picture of the biased movement of genes from the X chromosome to the autosomes. The bias of X-to-autosome movement was significantly stronger for RNA-based movements than for DNA-based movements, and among DNA-based movements there was an excess of genes moving onto the X chromosome as well. Genes involved in female-specific functions moved onto the X chromosome while genes with male-specific functions moved off the X. There was a significant overrepresentation of proteins involving chromosomal function among transposed genes, suggesting that genetic conflict between sexes and among chromosomes may be a driving force behind gene transposition in Drosophila.


Asunto(s)
Cromosomas de Insectos , Elementos Transponibles de ADN , Drosophila/genética , Genes de Insecto , Animales , Segregación Cromosómica , Femenino , Duplicación de Gen , Genoma de los Insectos , Masculino , Recombinación Genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA