RESUMEN
Aneuploidy, the incorrect number of whole chromosomes, is a common feature of tumors that contributes to their initiation and evolution. Preventing aneuploidy requires properly functioning kinetochores, which are large protein complexes assembled on centromeric DNA that link mitotic chromosomes to dynamic spindle microtubules and facilitate chromosome segregation. The kinetochore leverages at least two mechanisms to prevent aneuploidy: error correction and the spindle assembly checkpoint (SAC). BubR1, a factor involved in both processes, was identified as a cancer dependency and therapeutic target in multiple tumor types; however, it remains unclear what specific oncogenic pressures drive this enhanced dependency on BubR1 and whether it arises from BubR1's regulation of the SAC or error-correction pathways. Here, we use a genetically controlled transformation model and glioblastoma tumor isolates to show that constitutive signaling by RAS or MAPK is necessary for cancer-specific BubR1 vulnerability. The MAPK pathway enzymatically hyperstimulates a network of kinetochore kinases that compromises chromosome segregation, rendering cells more dependent on two BubR1 activities: counteracting excessive kinetochore-microtubule turnover for error correction and maintaining the SAC. This work expands our understanding of how chromosome segregation adapts to different cellular states and reveals an oncogenic trigger of a cancer-specific defect.
Asunto(s)
Neoplasias , Proteínas Serina-Treonina Quinasas , Aneuploidia , Carcinogénesis/metabolismo , Proteínas de Ciclo Celular/metabolismo , Segregación Cromosómica , Humanos , Cinetocoros/metabolismo , Microtúbulos/metabolismo , Mitosis/genética , Neoplasias/metabolismo , Proteínas Serina-Treonina Quinasas/genética , Huso Acromático/metabolismoRESUMEN
Bunyaviruses (Negarnaviricota: Bunyavirales) are a large and diverse group of viruses that include important human, veterinary, and plant pathogens. The rapid characterization of known and new emerging pathogens depends on the availability of comprehensive reference sequence databases that can be used to match unknowns, infer evolutionary relationships and pathogenic potential, and make response decisions in an evidence-based manner. In this study, we determined the coding-complete genome sequences of 99 bunyaviruses in the Centers for Disease Control and Prevention's Arbovirus Reference Collection, focusing on orthonairoviruses (family Nairoviridae), orthobunyaviruses (Peribunyaviridae), and phleboviruses (Phenuiviridae) that either completely or partially lacked genome sequences. These viruses had been collected over 66 years from 27 countries from vertebrates and arthropods representing 37 genera. Many of the viruses had been characterized serologically and through experimental infection of animals but were isolated in the pre-sequencing era. We took advantage of our unusually large sample size to systematically evaluate genomic characteristics of these viruses, including reassortment, and co-infection. We corroborated our findings using several independent molecular and virologic approaches, including Sanger sequencing of 197 genome segments, and plaque isolation of viruses from putative co-infected virus stocks. This study contributes to the described genetic diversity of bunyaviruses and will enhance the capacity to characterize emerging human pathogenic bunyaviruses.
Asunto(s)
Genoma Viral/genética , Nairovirus/genética , Orthobunyavirus/genética , Virus ARN/genética , Animales , Arbovirus/genética , Artrópodos/genética , Secuencia de Bases , Humanos , FilogeniaRESUMEN
Antigen-specific multifunctional T cells that secrete interferon-γ, interleukin-2 and tumour necrosis factor-α simultaneously after activation are important for the control of many infections. It is unclear if these CD8(+) T cells are at an early or late stage of differentiation and whether telomere erosion restricts their replicative capacity. We developed a multi-parameter flow cytometric method for investigating the relationship between differentiation (CD45RA and CD27 surface phenotype), function (cytokine production) and replicative capacity (telomere length) in individual cytomegalovirus (CMV) antigen-specific CD8(+) T cells. This involves surface and intracellular cell staining coupled to fluorescence in situ hybridization to detect telomeres (flow-FISH). The end-stage/senescent CD8(+) CD45RA(+) CD27(-) T-cell subset increases significantly during ageing and this is exaggerated in CMV immune-responsive subjects. However, these end-stage cells do not have the shortest telomeres, implicating additional non-telomere-related mechanisms in inducing their senescence. The telomere lengths in total and CMV (NLV)-specific CD8(+) T cells in all four subsets defined by CD45RA and CD27 expression were significantly shorter in old compared with young individuals in both a Caucasian and an Asian cohort. Following stimulation by anti-CD3 or NLV peptide, similar proportions of triple-cytokine-producing cells are found in CD8(+) T cells at all stages of differentiation in both age groups. Furthermore, these multi-functional cells had intermediate telomere lengths compared with cells producing only one or two cytokines after activation. Therefore, global and CMV (NLV)-specific CD8(+) T cells that secrete interferon-γ, interleukin-2 and tumour necrosis factor-α are at an intermediate stage of differentiation and are not restricted by excessive telomere erosion.
Asunto(s)
Envejecimiento/inmunología , Linfocitos T CD8-positivos/inmunología , Senescencia Celular , Infecciones por Citomegalovirus/inmunología , Citomegalovirus/inmunología , Activación de Linfocitos , Acortamiento del Telómero , Telómero/inmunología , Adulto , Factores de Edad , Anciano , Anciano de 80 o más Años , Envejecimiento/etnología , Envejecimiento/genética , Pueblo Asiatico/genética , Biomarcadores/metabolismo , Linfocitos T CD8-positivos/metabolismo , Linfocitos T CD8-positivos/virología , Diferenciación Celular , Proliferación Celular , Células Cultivadas , Citocinas/inmunología , Citocinas/metabolismo , Citomegalovirus/patogenicidad , Infecciones por Citomegalovirus/genética , Infecciones por Citomegalovirus/metabolismo , Infecciones por Citomegalovirus/virología , Citometría de Flujo , Humanos , Inmunofenotipificación/métodos , Antígenos Comunes de Leucocito/inmunología , Antígenos Comunes de Leucocito/metabolismo , Londres , Fenotipo , Singapur , Telómero/genética , Miembro 7 de la Superfamilia de Receptores de Factores de Necrosis Tumoral/inmunología , Miembro 7 de la Superfamilia de Receptores de Factores de Necrosis Tumoral/metabolismo , Población Blanca/genética , Adulto JovenRESUMEN
Analysis of mercury (Hg) in natural water samples has routinely been impractical in many environments, for example, artisanal and small-scale gold mines (ASGM), where difficult conditions make monitoring of harmful elements and chemicals used in the processes highly challenging. Current sampling methods require the use of hazardous or expensive materials, and so difficulties in sample collection and transport are elevated. To solve this problem, a solid-phase extraction-based method was developed for the sampling and preservation of dissolved Hg in natural water samples, particularly those found around ASGM sites. Recoveries of 85% ± 10% total Hg were obtained during 4 weeks of storage in refrigerated (4 °C, dark) and unrefrigerated (16 °C, dark) conditions, and from a representative river water spiked to 1 µg L-1 Hg2+, 94% ± 1% Hg recovery was obtained. Solid-phase extraction loading flow rates were tested at 2, 5, and 10 mL min-1 with no breakthrough of Hg, and sorbent stability showed no breakthrough of Hg up to 2 weeks after functionalisation. The method was deployed across five artisanal gold mines in Kakamega gold belt, Kenya, to assess Hg concentrations in mine shaft water, ore washing ponds, and river and stream water, including drinking water sources. In all waters, Hg concentrations were below the WHO guideline limit value of 6 µg L-1, but drinking water sources contained trace concentrations of up to 0.35 µg L-1 total Hg, which may result in negative health effects from long-term exposure. The SPE method developed and deployed here is a robust sampling method that can therefore be applied in future Hg monitoring, toxicology, and environmental work to provide improved data that is representative of total dissolved Hg in water samples.
RESUMEN
OBJECTIVES: Cavitation arising within the water around the oscillating ultrasonic scaler tip is an area that may lead to advances in enhancing biofilm removal. The aim of this study is to map the occurrence of cavitation around scaler tips under loaded conditions. MATERIALS AND METHODS: Two designs of piezoelectric ultrasonic scaling probes were evaluated with a scanning laser vibrometer and luminol dosimetric system under loaded (100 g/200 g) and unloaded conditions. Loads were applied to the probe tips via teeth mounted in a load-measuring apparatus. RESULTS: There was a positive correlation between probe displacement amplitude and cavitation production for ultrasonic probes. The position of cavitation at the tip of each probe was greater under loaded conditions than unloaded and for the longer P probe towards the tip. CONCLUSIONS: Whilst increasing vibration displacement amplitude of ultrasonic scalers increases the occurrence of cavitation, factors such as the length of the probe influence the amount of cavitation activity generated. The application of load affects the production of cavitation at the most clinically relevant area-the tip. CLINICAL RELEVANCE: Loading and the design of ultrasonic scalers lead to maximising the occurrence of the cavitation at the tip and enhance the cleaning efficiency of the scaler.
Asunto(s)
Raspado Dental/instrumentación , Análisis del Estrés Dental , Terapia por Ultrasonido/instrumentación , Diseño de Equipo , Humanos , Modelos Lineales , Diente Molar , Oscilometría , Estadísticas no Paramétricas , VibraciónRESUMEN
Mercury is considered to be one of the most toxic elements to humans. Due to pollution from industry and artisanal gold mining, mercury species are present globally in waters used for agriculture, aquaculture, and drinking water. This review summarises methods reported for preserving mercury species in water samples and highlights the associated hazards and issues with each. This includes the handling of acids in an uncontrolled environment, breakage of sample containers, and the collection and transport of sample volumes in excess of 1 L, all of which pose difficulties for both in situ collection and transportation. Literature related to aqueous mercury preservation from 2000-2021 was reviewed, as well as any commonly cited and relevant references. Amongst others, solid-phase extraction techniques were explored for preservation and preconcentration of total and speciated mercury in water samples. Additionally, the potential as a safe, in situ preservation and storage method for mercury species were summarised. The review highlighted that the stability of mercury is increased when adsorbed on a solid-phase and therefore the metal and its species can be preserved without the need for hazardous reagents or materials in the field. The mercury species can then be eluted upon return to a laboratory, where sensitive analytical detection and speciation methods can be better applied. Developments in solid phase extraction as a preservation method for unstable metals such as mercury will improve the quality of representative environmental data, and further improve toxicology and environmental monitoring studies.
Asunto(s)
Mercurio , Contaminantes Químicos del Agua , Humanos , Mercurio/análisis , Monitoreo del Ambiente/métodos , Contaminantes Químicos del Agua/análisis , Agua , OroRESUMEN
ELT-2 is the major transcription factor (TF) required for Caenorhabditis elegans intestinal development. ELT-2 expression initiates in embryos to promote development and then persists after hatching through the larval and adult stages. Though the sites of ELT-2 binding are characterized and the transcriptional changes that result from ELT-2 depletion are known, an intestine-specific transcriptome profile spanning developmental time has been missing. We generated this dataset by performing Fluorescence Activated Cell Sorting on intestine cells at distinct developmental stages. We analyzed this dataset in conjunction with previously conducted ELT-2 studies to evaluate the role of ELT-2 in directing the intestinal gene regulatory network through development. We found that only 33% of intestine-enriched genes in the embryo were direct targets of ELT-2 but that number increased to 75% by the L3 stage. This suggests additional TFs promote intestinal transcription especially in the embryo. Furthermore, only half of ELT-2's direct target genes were dependent on ELT-2 for their proper expression levels, and an equal proportion of those responded to elt-2 depletion with over-expression as with under-expression. That is, ELT-2 can either activate or repress direct target genes. Additionally, we observed that ELT-2 repressed its own promoter, implicating new models for its autoregulation. Together, our results illustrate that ELT-2 impacts roughly 20-50% of intestine-specific genes, that ELT-2 both positively and negatively controls its direct targets, and that the current model of the intestinal regulatory network is incomplete as the factors responsible for directing the expression of many intestinal genes remain unknown.
Asunto(s)
Proteínas de Caenorhabditis elegans , Caenorhabditis elegans , Animales , Caenorhabditis elegans/metabolismo , Proteínas de Caenorhabditis elegans/metabolismo , Redes Reguladoras de Genes , Factores de Transcripción GATA/genética , Intestinos , Perfilación de la Expresión Génica , TranscriptomaRESUMEN
The transcription factor GATA1 regulates an extensive program of gene activation and repression during erythroid development. However, the associated mechanisms, including the contributions of distal versus proximal cis-regulatory modules, co-occupancy with other transcription factors, and the effects of histone modifications, are poorly understood. We studied these problems genome-wide in a Gata1 knockout erythroblast cell line that undergoes GATA1-dependent terminal maturation, identifying 2616 GATA1-responsive genes and 15,360 GATA1-occupied DNA segments after restoration of GATA1. Virtually all occupied DNA segments have high levels of H3K4 monomethylation and low levels of H3K27me3 around the canonical GATA binding motif, regardless of whether the nearby gene is induced or repressed. Induced genes tend to be bound by GATA1 close to the transcription start site (most frequently in the first intron), have multiple GATA1-occupied segments that are also bound by TAL1, and show evolutionary constraint on the GATA1-binding site motif. In contrast, repressed genes are further away from GATA1-occupied segments, and a subset shows reduced TAL1 occupancy and increased H3K27me3 at the transcription start site. Our data expand the repertoire of GATA1 action in erythropoiesis by defining a new cohort of target genes and determining the spatial distribution of cis-regulatory modules throughout the genome. In addition, we begin to establish functional criteria and mechanisms that distinguish GATA1 activation from repression at specific target genes. More broadly, these studies illustrate how a "master regulator" transcription factor coordinates tissue differentiation through a panoply of DNA and protein interactions.
Asunto(s)
Eritropoyesis/efectos de los fármacos , Factor de Transcripción GATA1/metabolismo , Regulación del Desarrollo de la Expresión Génica , Genoma , Histonas/metabolismo , ARN Mensajero/metabolismo , Sitios de Unión , Diferenciación Celular , Línea Celular , Cromatina/metabolismo , Inmunoprecipitación de Cromatina , Eritroblastos/citología , Células Eritroides/citología , Factor de Transcripción GATA1/farmacología , Análisis de Secuencia por Matrices de Oligonucleótidos , ARN Mensajero/genéticaRESUMEN
DNA sequence motifs and epigenetic modifications contribute to specific binding by a transcription factor, but the extent to which each feature determines occupancy in vivo is poorly understood. We addressed this question in erythroid cells by identifying DNA segments occupied by GATA1 and measuring the level of trimethylation of histone H3 lysine 27 (H3K27me3) and monomethylation of H3 lysine 4 (H3K4me1) along a 66 Mb region of mouse chromosome 7. While 91% of the GATA1-occupied segments contain the consensus binding-site motif WGATAR, only approximately 0.7% of DNA segments with such a motif are occupied. Using a discriminative motif enumeration method, we identified additional motifs predictive of occupancy given the presence of WGATAR. The specific motif variant AGATAA and occurrence of multiple WGATAR motifs are both strong discriminators. Combining motifs to pair a WGATAR motif with a binding site motif for GATA1, EKLF or SP1 improves discriminative power. Epigenetic modifications are also strong determinants, with the factor-bound segments highly enriched for H3K4me1 and depleted of H3K27me3. Combining primary sequence and epigenetic determinants captures 52% of the GATA1-occupied DNA segments and substantially increases the specificity, to one out of seven segments with the required motif combination and epigenetic signals being bound.
Asunto(s)
Epigénesis Genética , Factor de Transcripción GATA1/metabolismo , Elementos Reguladores de la Transcripción , Animales , Sitios de Unión , ADN/química , ADN/metabolismo , Células Eritroides/metabolismo , Genoma , Histonas/metabolismo , Ratones , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismoRESUMEN
Low complexity domains (LCDs) in proteins are regions predominantly composed of a small subset of the possible amino acids. LCDs are involved in a variety of normal and pathological processes across all domains of life. Existing methods define LCDs using information-theoretical complexity thresholds, sequence alignment with repetitive regions, or statistical overrepresentation of amino acids relative to whole-proteome frequencies. While these methods have proven valuable, they are all indirectly quantifying amino acid composition, which is the fundamental and biologically-relevant feature related to protein sequence complexity. Here, we present a new computational tool, LCD-Composer, that directly identifies LCDs based on amino acid composition and linear amino acid dispersion. Using LCD-Composer's default parameters, we identified simple LCDs across all organisms available through UniProt and provide the resulting data in an accessible form as a resource. Furthermore, we describe large-scale differences between organisms from different domains of life and explore organisms with extreme LCD content for different LCD classes. Finally, we illustrate the versatility and specificity achievable with LCD-Composer by identifying diverse classes of LCDs using both simple and multifaceted composition criteria. We demonstrate that the ability to dissect LCDs based on these multifaceted criteria enhances the functional mapping and classification of LCDs.
RESUMEN
A double antibody sandwich ELISA developed by ID-DLO, Lelystad to detect Corynebacterium pseudotuberculosis infection was used on 329 sheep from four pedigree Suffolk flocks in which clinical cases of caseous lymphadenitis (CLA) had occurred. At subsequent necropsy, typical CLA lesions were seen in 133 sheep, and the diagnosis was confirmed on culture. Lesions were most commonly seen in lungs (n = 46), parotid lymph nodes (n = 44), prescapular lymph nodes (n = 38) and mediastinal lymph nodes (n = 31). The sensitivity of the ELISA test for detecting culture-positive sheep was 0.88, while the specificity of the test was 0.55. The antibody ELISA detected 87.5 per cent of sheep that had CLA lesions restricted to internal organs only. It was concluded that the ELISA test has a valuable role in detecting sheep with both clinical and subclinical CLA.
RESUMEN
BACKGROUND: Establishing genomic resources for closely related species will provide comparative insights that are crucial for understanding diversity and variability at multiple levels of biological organization. We developed ESTs for Mexican axolotl (Ambystoma mexicanum) and Eastern tiger salamander (A. tigrinum tigrinum), species with deep and diverse research histories. RESULTS: Approximately 40,000 quality cDNA sequences were isolated for these species from various tissues, including regenerating limb and tail. These sequences and an existing set of 16,030 cDNA sequences for A. mexicanum were processed to yield 35,413 and 20,599 high quality ESTs for A. mexicanum and A. t. tigrinum, respectively. Because the A. t. tigrinum ESTs were obtained primarily from a normalized library, an approximately equal number of contigs were obtained for each species, with 21,091 unique contigs identified overall. The 10,592 contigs that showed significant similarity to sequences from the human RefSeq database reflected a diverse array of molecular functions and biological processes, with many corresponding to genes expressed during spinal cord injury in rat and fin regeneration in zebrafish. To demonstrate the utility of these EST resources, we searched databases to identify probes for regeneration research, characterized intra- and interspecific nucleotide polymorphism, saturated a human - Ambystoma synteny group with marker loci, and extended PCR primer sets designed for A. mexicanum / A. t. tigrinum orthologues to a related tiger salamander species. CONCLUSIONS: Our study highlights the value of developing resources in traditional model systems where the likelihood of information transfer to multiple, closely related taxa is high, thus simultaneously enabling both laboratory and natural history research.
Asunto(s)
Ambystoma/genética , Etiquetas de Secuencia Expresada , Ambystoma mexicanum/genética , Animales , Mapeo Cromosómico/métodos , Mapeo Contig , Biblioteca de Genes , Humanos , Polimorfismo Genético , Ratas , Regeneración/genética , Homología de Secuencia de Ácido Nucleico , Pez CebraRESUMEN
An ultrasonic dental descaling instrument has been characterised using sonochemical techniques. Mapping the emission from luminol solution revealed the distribution of cavitation produced in water around the tips. Hydroxyl radical production rates arising from water sonolysis were measured using terephthalate dosimetry and found to be in the range of µmolmin(-1), comparable with those from a sonochemical horn. Removal of an ink coating from a glass slide showed that cleaning occurred primarily where the tip contacted the surface but was also observed in regions where cavitation occurred even when the tip did not contact the surface. Differences in behaviour were noted between different tip designs and computer simulation of the acoustic pressure distributions using COMSOL showed the reasons behind the different behaviour of the tip designs.
Asunto(s)
Instrumentos Dentales , Ultrasonido , Simulación por Computador , Diseño de Equipo , Radical Hidroxilo/química , Luminol/química , Movimiento (Física) , Ácidos Ftálicos/química , Presión , Propiedades de Superficie , Agua/químicaRESUMEN
Ultrasonic scalers are used in dentistry to remove calculus and other contaminants from teeth. One mechanism which may assist in the cleaning is cavitation generated in cooling water around the scaler. The vibratory motion of three designs of scaler tip in a water bath has been characterised by laser vibrometry, and compared with the spatial distribution of cavitation around the scaler tips observed using sonochemiluminescence from a luminol solution. The type of cavitation was confirmed by acoustic emission analysed by a 'Cavimeter' supplied by NPL. A node/antinode vibration pattern was observed, with the maximum displacement of each type of tip occurring at the free end. High levels of cavitation activity occurred in areas surrounding the vibration antinodes, although minimal levels were observed at the free end of the tip. There was also good correlation between vibration amplitude and sonochemiluminescence at other points along the scaler tip. 'Cavimeter' analysis correlated well with luminol observations, suggesting the presence of primarily transient cavitation.
Asunto(s)
Raspado Dental/instrumentación , Raspado Dental/métodos , Ultrasonido , VibraciónRESUMEN
Tissue development and function are exquisitely dependent on proper regulation of gene expression, but it remains controversial whether the genomic signals controlling this process are subject to strong selective constraint. While some studies show that highly constrained noncoding regions act to enhance transcription, other studies show that DNA segments with biochemical signatures of regulatory regions, such as occupancy by a transcription factor, are seemingly unconstrained across mammalian evolution. To test the possible correlation of selective constraint with enhancer activity, we used chromatin immunoprecipitation as an approach unbiased by either evolutionary constraint or prior knowledge of regulatory activity to identify DNA segments within a 66-Mb region of mouse chromosome 7 that are occupied by the erythroid transcription factor GATA1. DNA segments bound by GATA1 were identified by hybridization to high-density tiling arrays, validated by quantitative PCR, and tested for gene regulatory activity in erythroid cells. Whereas almost all of the occupied segments contain canonical WGATAR binding site motifs for GATA1, in only 45% of the cases is the motif deeply preserved (found at the orthologous position in placental mammals or more distant species). However, GATA1-bound segments with high enhancer activity tend to be the ones with an evolutionarily preserved WGATAR motif, and this relationship was confirmed by a loss-of-function assay. Thus, GATA1 binding sites that regulate gene expression during erythroid maturation are under strong selective constraint, while nonconstrained binding may have only a limited or indirect role in regulation.
Asunto(s)
ADN/genética , Evolución Molecular , Factor de Transcripción GATA1/química , Factor de Transcripción GATA1/genética , Transcripción Genética , Secuencias de Aminoácidos/genética , Secuencia de Aminoácidos/genética , Animales , Sitios de Unión/genética , Línea Celular Tumoral , Inmunoprecipitación de Cromatina , Cromosomas de los Mamíferos , Elementos de Facilitación Genéticos , Regulación de la Expresión Génica , Ratones , Filogenia , Reproducibilidad de los Resultados , Homología de Secuencia de AminoácidoRESUMEN
Identification of functional genomic regions using interspecies comparison will be most effective when the full span of relationships between genomic function and evolutionary constraint are utilized. We find that sets of putative transcriptional regulatory sequences, defined by ENCODE experimental data, have a wide span of evolutionary histories, ranging from stringent constraint shown by deep phylogenetic comparisons to recent selection on lineage-specific elements. This diversity of evolutionary histories can be captured, at least in part, by the suite of available comparative genomics tools, especially after correction for regional differences in the neutral substitution rate. Putative transcriptional regulatory regions show alignability in different clades, and the genes associated with them are enriched for distinct functions. Some of the putative regulatory regions show evidence for recent selection, including a primate-specific, distal promoter that may play a novel role in regulation.
Asunto(s)
Bases de Datos Genéticas , Evolución Molecular , Genómica , Primates/genética , Secuencias Reguladoras de Ácidos Nucleicos , Transcripción Genética , Animales , Regulación de la Expresión Génica/fisiología , HumanosRESUMEN
This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome.ucsc.edu, downloaded in bulk by anonymous FTP from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2.bx.psu.edu. This article illustrates the power of this resource for exploring vertebrate and mammalian evolution, using three examples. First, we present several vignettes involving insertions and deletions within protein-coding regions, including a look at some human-specific indels. Then we study the extent to which start codons and stop codons in the human sequence are conserved in other species, showing that start codons are in general more poorly conserved than stop codons. Finally, an investigation of the phylogenetic depth of conservation for several classes of functional elements in the human genome reveals striking differences in the rates and modes of decay in alignability. Each functional class has a distinctive period of stringent constraint, followed by decays that allow (for the case of regulatory regions) or reject (for coding regions and ultraconserved elements) insertions and deletions.
Asunto(s)
Secuencia Conservada , Bases de Datos Genéticas , Alineación de Secuencia/métodos , Animales , Secuencia de Bases , Gatos , Bovinos , Codón Iniciador/genética , Codón de Terminación/genética , Perros , Genoma Humano , Cobayas , Humanos , Ratones , Datos de Secuencia Molecular , Mutagénesis Insercional , Conejos , Ratas , Eliminación de SecuenciaRESUMEN
Genomic sequence signals - such as base composition, presence of particular motifs, or evolutionary constraint - have been used effectively to identify functional elements. However, approaches based only on specific signals known to correlate with function can be quite limiting. When training data are available, application of computational learning algorithms to multispecies alignments has the potential to capture broader and more informative sequence and evolutionary patterns that better characterize a class of elements. However, effective exploitation of patterns in multispecies alignments is impeded by the vast number of possible alignment columns and by a limited understanding of which particular strings of columns may characterize a given class. We have developed a computational method, called ESPERR (evolutionary and sequence pattern extraction through reduced representations), which uses training examples to learn encodings of multispecies alignments into reduced forms tailored for the prediction of chosen classes of functional elements. ESPERR produces a greatly improved Regulatory Potential score, which can discriminate regulatory regions from neutral sites with excellent accuracy ( approximately 94%). This score captures strong signals (GC content and conservation), as well as subtler signals (with small contributions from many different alignment patterns) that characterize the regulatory elements in our training set. ESPERR is also effective for predicting other classes of functional elements, as we show for DNaseI hypersensitive sites and highly conserved regions with developmental enhancer activity. Our software, training data, and genome-wide predictions are available from our Web site (http://www.bx.psu.edu/projects/esperr).
Asunto(s)
Desoxirribonucleasa I/inmunología , Elementos de Facilitación Genéticos , Evolución Molecular , Genómica , Secuencias Reguladoras de Ácidos Nucleicos , Alineación de Secuencia/métodos , Algoritmos , Composición de Base , Emparejamiento Base , Biología Computacional , Secuencia Conservada , Datos de Secuencia Molecular , Curva ROCRESUMEN
Multiple alignments of genome sequences are helpful guides to functional analysis, but predicting cis-regulatory modules (CRMs) accurately from such alignments remains an elusive goal. We predict CRMs for mammalian genes expressed in red blood cells by combining two properties gleaned from aligned, noncoding genome sequences: a positive regulatory potential (RP) score, which detects similarity to patterns in alignments distinctive for regulatory regions, and conservation of a binding site motif for the essential erythroid transcription factor GATA-1. Within eight target loci, we tested 75 noncoding segments by reporter gene assays in transiently transfected human K562 cells and/or after site-directed integration into murine erythroleukemia cells. Segments with a high RP score and a conserved exact match to the binding site consensus are validated at a good rate (50%-100%, with rates increasing at higher RP), whereas segments with lower RP scores or nonconsensus binding motifs tend to be inactive. Active DNA segments were shown to be occupied by GATA-1 protein by chromatin immunoprecipitation, whereas sites predicted to be inactive were not occupied. We verify four previously known erythroid CRMs and identify 28 novel ones. Thus, high RP in combination with another feature of a CRM, such as a conserved transcription factor binding site, is a good predictor of functional CRMs. Genome-wide predictions based on RP and a large set of well-defined transcription factor binding sites are available through servers at http://www.bx.psu.edu/.
Asunto(s)
Leucemia Eritroblástica Aguda/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Animales , Sitios de Unión , Inmunoprecipitación de Cromatina , Secuencia Conservada , Factor de Transcripción GATA1/química , Factor de Transcripción GATA1/metabolismo , Perfilación de la Expresión Génica , Genes Reporteros , Genoma , Humanos , Células K562 , Mamíferos , Ratones , Reproducibilidad de los Resultados , TransfecciónRESUMEN
Techniques of comparative genomics are being used to identify candidate functional DNA sequences, and objective evaluations are needed to assess their effectiveness. Different analytical methods score distinctive features of whole-genome alignments among human, mouse, and rat to predict functional regions. We evaluated three of these methods for their ability to identify the positions of known regulatory regions in the well-studied HBB gene complex. Two methods, multispecies conserved sequences and phastCons, quantify levels of conservation to estimate a likelihood that aligned DNA sequences are under purifying selection. A third function, regulatory potential (RP), measures the similarity of patterns in the alignments to those in known regulatory regions. The methods can correctly identify 50%-60% of noncoding positions in the HBB gene complex as regulatory or nonregulatory, with RP performing better than do other methods. When evaluated by the ability to discriminate genomic intervals, RP reaches a sensitivity of 0.78 and a true discovery rate of approximately 0.6. The performance is better on other reference sets; both phastCons and RP scores can capture almost all regulatory elements in those sets along with approximately 7% of the human genome.