Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Genome Res ; 31(4): 607-621, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33514624

RESUMEN

The establishment of centromeric chromatin and its propagation by the centromere-specific histone CENPA is mediated by epigenetic mechanisms in most eukaryotes. DNA replication origins, origin binding proteins, and replication timing of centromere DNA are important determinants of centromere function. The epigenetically regulated regional centromeres in the budding yeast Candida albicans have unique DNA sequences that replicate earliest in every chromosome and are clustered throughout the cell cycle. In this study, the genome-wide occupancy of the replication initiation protein Orc4 reveals its abundance at all centromeres in C. albicans Orc4 is associated with four different DNA sequence motifs, one of which coincides with tRNA genes (tDNA) that replicate early and cluster together in space. Hi-C combined with genome-wide replication timing analyses identify that early replicating Orc4-bound regions interact with themselves stronger than with late replicating Orc4-bound regions. We simulate a polymer model of chromosomes of C. albicans and propose that the early replicating and highly enriched Orc4-bound sites preferentially localize around the clustered kinetochores. We also observe that Orc4 is constitutively localized to centromeres, and both Orc4 and the helicase Mcm2 are essential for cell viability and CENPA stability in C. albicans Finally, we show that new molecules of CENPA are recruited to centromeres during late anaphase/telophase, which coincides with the stage at which the CENPA-specific chaperone Scm3 localizes to the kinetochore. We propose that the spatiotemporal localization of Orc4 within the nucleus, in collaboration with Mcm2 and Scm3, maintains centromeric chromatin stability and CENPA recruitment in C. albicans.


Asunto(s)
Candida albicans , Centrómero , Cromatina , Complejo de Reconocimiento del Origen/metabolismo , Candida albicans/genética , Centrómero/genética , Cromatina/química , Cromatina/genética , Cromatina/metabolismo , Histonas/metabolismo , Cinetocoros , Origen de Réplica/genética
2.
Med Mycol ; 62(3)2024 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-38414264

RESUMEN

Candida auris poses threats to the global medical community due to its multidrug resistance, ability to cause nosocomial outbreaks and resistance to common sterilization agents. Different variants that emerged at different geographical zones were classified as clades. Clade-typing becomes necessary to track its spread, possible emergence of new clades, and to predict the properties that exhibit a clade bias. We previously reported a colony-Polymerase Chain Reaction-based, clade-identification method employing whole genome alignments and identification of clade-specific sequences of four major geographical clades. Here, we expand the panel by identifying clade 5 which was later isolated in Iran, using specific primers designed through in silico analyses.


Candida auris, a multidrug-resistant fungal pathogen, evolves as distinct geographical clades. We describe the identification of clade 5 specific DNA sequence, which was used to design primers that distinguished clade 5 from other clades, adding to the panel of the clade-identification system.


Asunto(s)
Candida , Candidiasis , Animales , Candida/genética , Candidiasis/epidemiología , Candidiasis/veterinaria , Candida auris , Reacción en Cadena de la Polimerasa/veterinaria , Genoma Fúngico , Antifúngicos/farmacología , Pruebas de Sensibilidad Microbiana/veterinaria
3.
PLoS Comput Biol ; 15(3): e1006921, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30897079

RESUMEN

ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) is a high-throughput technique to identify genomic regions that are bound in vivo by a particular protein, e.g., a transcription factor (TF). Biological factors, such as chromatin state, indirect and cooperative binding, as well as experimental factors, such as antibody quality, cross-linking, and PCR biases, are known to affect the outcome of ChIP-seq experiments. However, the relative impact of these factors on inferences made from ChIP-seq data is not entirely clear. Here, via a detailed ChIP-seq simulation pipeline, ChIPulate, we assess the impact of various biological and experimental sources of variation on several outcomes of a ChIP-seq experiment, viz., the recoverability of the TF binding motif, accuracy of TF-DNA binding detection, the sensitivity of inferred TF-DNA binding strength, and number of replicates needed to confidently infer binding strength. We find that the TF motif can be recovered despite poor and non-uniform extraction and PCR amplification efficiencies. The recovery of the motif is, however, affected to a larger extent by the fraction of sites that are either cooperatively or indirectly bound. Importantly, our simulations reveal that the number of ChIP-seq replicates needed to accurately measure in vivo occupancy at high-affinity sites is larger than the recommended community standards. Our results establish statistical limits on the accuracy of inferences of protein-DNA binding from ChIP-seq and suggest that increasing the mean extraction efficiency, rather than amplification efficiency, would better improve sensitivity. The source code and instructions for running ChIPulate can be found at https://github.com/vishakad/chipulate.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Factores de Transcripción , Sitios de Unión/genética , Simulación por Computador , ADN/química , ADN/genética , ADN/metabolismo , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Escherichia coli/genética , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , Unión Proteica/genética , Factores de Transcripción/química , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
4.
Nucleic Acids Res ; 46(5): e29, 2018 03 16.
Artículo en Inglés | MEDLINE | ID: mdl-29267972

RESUMEN

We present THiCweed, a new approach to analyzing transcription factor binding data from high-throughput chromatin immunoprecipitation-sequencing (ChIP-seq) experiments. THiCweed clusters bound regions based on sequence similarity using a divisive hierarchical clustering approach based on sequence similarity within sliding windows, while exploring both strands. ThiCweed is specially geared toward data containing mixtures of motifs, which present a challenge to traditional motif-finders. Our implementation is significantly faster than standard motif-finding programs, able to process 30 000 peaks in 1-2 h, on a single CPU core of a desktop computer. On synthetic data containing mixtures of motifs it is as accurate or more accurate than all other tested programs. THiCweed performs best with large 'window' sizes (≥50 bp), much longer than typical binding sites (7-15 bp). On real data it successfully recovers literature motifs, but also uncovers complex sequence characteristics in flanking DNA, variant motifs and secondary motifs even when they occur in <5% of the input, all of which appear biologically relevant. We also find recurring sequence patterns across diverse ChIP-seq datasets, possibly related to chromatin architecture and looping. THiCweed thus goes beyond traditional motif finding to give new insights into genomic transcription factor-binding complexity.


Asunto(s)
Algoritmos , Biología Computacional/métodos , ADN/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Motivos de Nucleótidos/genética , Sitios de Unión/genética , Cromatina/genética , Cromatina/metabolismo , Inmunoprecipitación de Cromatina/métodos , Análisis por Conglomerados , ADN/química , ADN/metabolismo , Genómica/métodos , Humanos , Unión Proteica , Reproducibilidad de los Resultados , Factores de Transcripción/metabolismo
5.
Nucleic Acids Res ; 45(5): 2629-2643, 2017 03 17.
Artículo en Inglés | MEDLINE | ID: mdl-28100699

RESUMEN

Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies.


Asunto(s)
Proteínas Fúngicas/genética , Genoma Fúngico , Malassezia/genética , Anotación de Secuencia Molecular/métodos , Proteogenómica/métodos , Genes Fúngicos , Genoma Mitocondrial , Péptidos/genética , Dominios Proteicos , Análisis de Secuencia de ARN
6.
PLoS Genet ; 12(2): e1005839, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26845548

RESUMEN

The centromere, on which kinetochore proteins assemble, ensures precise chromosome segregation. Centromeres are largely specified by the histone H3 variant CENP-A (also known as Cse4 in yeasts). Structurally, centromere DNA sequences are highly diverse in nature. However, the evolutionary consequence of these structural diversities on de novo CENP-A chromatin formation remains elusive. Here, we report the identification of centromeres, as the binding sites of four evolutionarily conserved kinetochore proteins, in the human pathogenic budding yeast Candida tropicalis. Each of the seven centromeres comprises a 2 to 5 kb non-repetitive mid core flanked by 2 to 5 kb inverted repeats. The repeat-associated centromeres of C. tropicalis all share a high degree of sequence conservation with each other and are strikingly diverged from the unique and mostly non-repetitive centromeres of related Candida species--Candida albicans, Candida dubliniensis, and Candida lusitaniae. Using a plasmid-based assay, we further demonstrate that pericentric inverted repeats and the underlying DNA sequence provide a structural determinant in CENP-A recruitment in C. tropicalis, as opposed to epigenetically regulated CENP-A loading at centromeres in C. albicans. Thus, the centromere structure and its influence on de novo CENP-A recruitment has been significantly rewired in closely related Candida species. Strikingly, the centromere structural properties along with role of pericentric repeats in de novo CENP-A loading in C. tropicalis are more reminiscent to those of the distantly related fission yeast Schizosaccharomyces pombe. Taken together, we demonstrate, for the first time, fission yeast-like repeat-associated centromeres in an ascomycetous budding yeast.


Asunto(s)
Candida tropicalis/genética , Centrómero/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Autoantígenos/metabolismo , Emparejamiento Base/genética , Proteína A Centromérica , Inmunoprecipitación de Cromatina , Proteínas Cromosómicas no Histona/metabolismo , Mapeo Cromosómico , Segregación Cromosómica/genética , Cromosomas Fúngicos/metabolismo , Secuencia Conservada , Evolución Molecular , Reordenamiento Génico/genética , Genoma Fúngico , Secuencias Invertidas Repetidas/genética , Cinetocoros/metabolismo , Mitosis , Schizosaccharomyces/genética , Especificidad de la Especie
7.
R Soc Open Sci ; 11(1): 231088, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38269075

RESUMEN

Transcription factor binding sites (TFBS), like other DNA sequence, evolve via mutation and selection relating to their function. Models of nucleotide evolution describe DNA evolution via single-nucleotide mutation. A stationary vector of such a model is the long-term distribution of nucleotides, unchanging under the model. Neutrally evolving sites may have uniform stationary vectors, but one expects that sites within a TFBS instead have stationary vectors reflective of the fitness of various nucleotides at those positions. We introduce 'position-specific stationary vectors' (PSSVs), the collection of stationary vectors at each site in a TFBS locus, analogous to the position weight matrix (PWM) commonly used to describe TFBS. We infer PSSVs for human TFs using two evolutionary models (Felsenstein 1981 and Hasegawa-Kishino-Yano 1985). We find that PSSVs reflect the nucleotide distribution from PWMs, but with reduced specificity. We infer ancestral nucleotide distributions at individual positions and calculate 'conditional PSSVs' conditioned on specific choices of majority ancestral nucleotide. We find that certain ancestral nucleotides exert a strong evolutionary pressure on neighbouring sequence while others have a negligible effect. Finally, we present a fast likelihood calculation for the F81 model on moderate-sized trees that makes this approach feasible for large-scale studies along these lines.

8.
PLoS One ; 19(4): e0302271, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38630664

RESUMEN

We provide new algorithms for two tasks relating to heterogeneous tabular datasets: clustering, and synthetic data generation. Tabular datasets typically consist of heterogeneous data types (numerical, ordinal, categorical) in columns, but may also have hidden cluster structure in their rows: for example, they may be drawn from heterogeneous (geographical, socioeconomic, methodological) sources, such that the outcome variable they describe (such as the presence of a disease) may depend not only on the other variables but on the cluster context. Moreover, sharing of biomedical data is often hindered by patient confidentiality laws, and there is current interest in algorithms to generate synthetic tabular data from real data, for example via deep learning. We demonstrate a novel EM-based clustering algorithm, MMM ("Madras Mixture Model"), that outperforms standard algorithms in determining clusters in synthetic heterogeneous data, and recovers structure in real data. Based on this, we demonstrate a synthetic tabular data generation algorithm, MMMsynth, that pre-clusters the input data, and generates cluster-wise synthetic data assuming cluster-specific data distributions for the input columns. We benchmark this algorithm by testing the performance of standard ML algorithms when they are trained on synthetic data and tested on real published datasets. Our synthetic data generation algorithm outperforms other literature tabular-data generators, and approaches the performance of training purely with real data.


Asunto(s)
Algoritmos , Humanos , India , Análisis por Conglomerados
9.
Heliyon ; 9(8): e18211, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37520992

RESUMEN

Transcription factors (TFs) and their binding sites have evolved to interact cooperatively or competitively with each other. Here we examine in detail, across multiple cell lines, such cooperation or competition among TFs both in sequential and spatial proximity (using chromatin conformation capture assays), considering in vivo binding data as well as TF binding motifs in DNA. We ascertain significantly co-occurring ("attractive") or avoiding ("repulsive") TF pairs using robust randomized models that retain the essential characteristics of the experimental data. Across human cell lines TFs organize into two groups, with intra-group attraction and inter-group repulsion. This is true for both sequential and spatial proximity, and for both in vivo binding and sequence motifs. Attractive TF pairs exhibit significantly more physical interactions suggesting an underlying mechanism. The two TF groups differ significantly in their genomic and network properties, as well in their function-while one group regulates housekeeping function, the other potentially regulates lineage-specific functions, that are disrupted in cancer. Weaker binding sites tend to occur in spatially interacting regions of the genome. Our results suggest that a complex pattern of spatial cooperativity of TFs and chromatin has evolved with the genome to support housekeeping and lineage-specific functions.

10.
iScience ; 26(10): 107846, 2023 Oct 20.
Artículo en Inglés | MEDLINE | ID: mdl-37767000

RESUMEN

Early onset of type 2 diabetes and cardiovascular disease are common complications for women diagnosed with gestational diabetes. Prediabetes refers to a condition in which blood glucose levels are higher than normal, but not yet high enough to be diagnosed as type 2 diabetes. Currently, there is no accurate way of knowing which women with gestational diabetes are likely to develop postpartum prediabetes. This study aims to predict the risk of postpartum prediabetes in women diagnosed with gestational diabetes. Our sparse logistic regression approach selects only two variables - antenatal fasting glucose at OGTT and HbA1c soon after the diagnosis of GDM - as relevant, but gives an area under the receiver operating characteristic curve of 0.72, outperforming all other methods. We envision this to be a practical solution, which coupled with a targeted follow-up of high-risk women, could yield better cardiometabolic outcomes in women with a history of GDM.

11.
Microbiol Spectr ; 10(2): e0063422, 2022 04 27.
Artículo en Inglés | MEDLINE | ID: mdl-35343775

RESUMEN

Candida auris, the multidrug-resistant human fungal pathogen, emerged as four major distinct geographical clades (clade 1-clade 4) in the past decade. Though isolates of the same species, C. auris clinical strains exhibit clade-specific properties associated with virulence and drug resistance. In this study, we report the identification of unique DNA sequence junctions by mapping clade-specific regions through comparative analysis of whole-genome sequences of strains belonging to different clades. These unique DNA sequence stretches are used to identify C. auris isolates at the clade level in subsequent in silico and experimental analyses. We develop a colony PCR-based clade-identification system (ClaID), which is rapid and specific. In summary, we demonstrate a proof-of-concept for using unique DNA sequence junctions conserved in a clade-specific manner for the rapid identification of each of the four major clades of C. auris. IMPORTANCE C. auris was first isolated in Japan in 2009 as an antifungal drug-susceptible pathogen causing localized infections. Within a decade, it simultaneously evolved in different parts of the world as distinct clades exhibiting resistance to antifungal drugs at varying levels. Recent studies hinted the mixing of isolates belonging to different geographical clades in a single location, suggesting that the area of isolation alone may not indicate the clade status of an isolate. In this study, we compared the genomes of representative strains of the four major clades to identify clade-specific sequences, which were then used to design clade-specific primers. We propose the utilization of whole genome sequence data to extract clade-specific sequences for clade-typing. The colony PCR-based method employed can rapidly distinguish between the four major clades of C. auris, with scope for expanding the panel by adding more primer pairs.


Asunto(s)
Antifúngicos , Candida , Antifúngicos/farmacología , Antifúngicos/uso terapéutico , Candida/genética , Candida auris , Humanos , Japón , Pruebas de Sensibilidad Microbiana , Virulencia
12.
PLoS One ; 17(3): e0264648, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35255105

RESUMEN

OBJECTIVE: The aim of the present study was to identify the factors associated with non-attendance of immediate postpartum glucose test using a machine learning algorithm following gestational diabetes mellitus (GDM) pregnancy. METHOD: A retrospective cohort study of all GDM women (n = 607) for postpartum glucose test due between January 2016 and December 2019 at the George Eliot Hospital NHS Trust, UK. RESULTS: Sixty-five percent of women attended postpartum glucose test. Type 2 diabetes was diagnosed in 2.8% and 21.6% had persistent dysglycaemia at 6-13 weeks post-delivery. Those who did not attend postpartum glucose test seem to be younger, multiparous, obese, and continued to smoke during pregnancy. They also had higher fasting glucose at antenatal oral glucose tolerance test. Our machine learning algorithm predicted postpartum glucose non-attendance with an area under the receiver operating characteristic curve of 0.72. The model could achieve a sensitivity of 70% with 66% specificity at a risk score threshold of 0.46. A total of 233 (38.4%) women attended subsequent glucose test at least once within the first two years of delivery and 24% had dysglycaemia. Compared to women who attended postpartum glucose test, those who did not attend had higher conversion rate to type 2 diabetes (2.5% vs 11.4%; p = 0.005). CONCLUSION: Postpartum screening following GDM is still poor. Women who did not attend postpartum screening appear to have higher metabolic risk and higher conversion to type 2 diabetes by two years post-delivery. Machine learning model can predict women who are unlikely to attend postpartum glucose test using simple antenatal factors. Enhanced, personalised education of these women may improve postpartum glucose screening.


Asunto(s)
Diabetes Mellitus Tipo 2 , Diabetes Gestacional , Glucemia/metabolismo , Diabetes Mellitus Tipo 2/diagnóstico , Diabetes Mellitus Tipo 2/epidemiología , Diabetes Gestacional/diagnóstico , Diabetes Gestacional/epidemiología , Diabetes Gestacional/metabolismo , Femenino , Glucosa , Humanos , Aprendizaje Automático , Masculino , Periodo Posparto , Embarazo , Estudios Retrospectivos
13.
Proc Natl Acad Sci U S A ; 105(50): 19797-802, 2008 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-19060206

RESUMEN

The Cse4p-containing centromere regions of Candida albicans have unique and different DNA sequences on each of the eight chromosomes. In a closely related yeast, C. dubliniensis, we have identified the centromeric histone, CdCse4p, and shown that it is localized at the kinetochore. We have identified putative centromeric regions, orthologous to the C. albicans centromeres, in each of the eight C. dubliniensis chromosomes by bioinformatic analysis. Chromatin immunoprecipitation followed by PCR using a specific set of primers confirmed that these regions bind CdCse4p in vivo. As in C. albicans, the CdCse4p-associated core centromeric regions are 3-5 kb in length and show no sequence similarity to one another. Comparative sequence analysis suggests that the Cse4p-rich centromere DNA sequences in these two species have diverged faster than other orthologous intergenic regions and even faster than our best estimated "neutral" mutation rate. However, the location of the centromere and the relative position of Cse4p-rich centromeric chromatin in the orthologous regions with respect to adjacent ORFs are conserved in both species, suggesting that centromere identity is not solely determined by DNA sequence. Unlike known point and regional centromeres of other organisms, centromeres in C. albicans and C. dubliniensis have no common centromere-specific sequence motifs or repeats except some of the chromosome-specific pericentric repeats that are found to be similar in these two species. We propose that centromeres of these two Candida species are of an intermediate type between point and regional centromeres.


Asunto(s)
Candida albicans/genética , Candida/genética , Centrómero/genética , Proteínas Cromosómicas no Histona/metabolismo , Evolución Molecular , Proteínas Fúngicas/metabolismo , Cinetocoros/metabolismo , Secuencia de Bases , Candida/metabolismo , Candida/patogenicidad , Candida albicans/metabolismo , Candida albicans/patogenicidad , Centrómero/metabolismo , Cromatina/metabolismo , Cromosomas Fúngicos/genética , Cromosomas Fúngicos/metabolismo , Secuencia Conservada , ADN de Hongos/genética , ADN de Hongos/metabolismo , Genes Fúngicos , Histonas/metabolismo , Sintenía
14.
mBio ; 12(3)2021 05 11.
Artículo en Inglés | MEDLINE | ID: mdl-33975937

RESUMEN

The thermotolerant multidrug-resistant ascomycete Candida auris rapidly emerged since 2009 causing systemic infections worldwide and simultaneously evolved in different geographical zones. The molecular events that orchestrated this sudden emergence of the killer fungus remain mostly elusive. Here, we identify centromeres in C. auris and related species, using a combined approach of chromatin immunoprecipitation and comparative genomic analyses. We find that C. auris and multiple other species in the Clavispora/Candida clade shared a conserved small regional GC-poor centromere landscape lacking pericentromeres or repeats. Further, a centromere inactivation event led to karyotypic alterations in this species complex. Interspecies genome analysis identified several structural chromosomal changes around centromeres. In addition, centromeres are found to be rapidly evolving loci among the different geographical clades of the same species of C. auris Finally, we reveal an evolutionary trajectory of the unique karyotype associated with clade 2 that consists of the drug-susceptible isolates of C. aurisIMPORTANCECandida auris, the killer fungus, emerged as different geographical clades, exhibiting multidrug resistance and high karyotype plasticity. Chromosomal rearrangements are known to play key roles in the emergence of new species, virulence, and drug resistance in pathogenic fungi. Centromeres, the genomic loci where microtubules attach to separate the sister chromatids during cell division, are known to be hot spots of breaks and downstream rearrangements. We identified the centromeres in C. auris and related species to study their involvement in the evolution and karyotype diversity reported in C. auris We report conserved centromere features in 10 related species and trace the events that occurred at the centromeres during evolution. We reveal a centromere inactivation-mediated chromosome number change in these closely related species. We also observe that one of the geographical clades, the East Asian clade, evolved along a unique trajectory, compared to the other clades and related species.


Asunto(s)
Candida/genética , Centrómero/genética , Centrómero/metabolismo , Cromosomas/genética , Evolución Molecular , Genoma Fúngico , Antifúngicos/farmacología , Candida/clasificación , Candida/efectos de los fármacos , Candidiasis/microbiología , Centrómero/clasificación , Cromosomas/clasificación , Genómica , Virulencia
15.
BMC Bioinformatics ; 11: 464, 2010 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-20846408

RESUMEN

BACKGROUND: While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence. RESULTS: We demonstrate that, on real and synthetic data, Sigma-2 significantly outperforms other programs in specificity to genuine homology (that is, it minimises alignment of spuriously similar regions that do not have a common ancestry) while it is now as sensitive as the best current programs. CONCLUSIONS: Comparing these results with an extrapolation of the best results from other available programs, we suggest that conservation rates in intergenic DNA are often significantly over-estimated. It is increasingly important to align non-coding DNA correctly, in regulatory genomics and in the context of whole-genome alignment, and Sigma-2 is an important step in that direction.


Asunto(s)
ADN Intergénico/química , Evolución Molecular , Genómica/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN , Programas Informáticos , Funciones de Verosimilitud
16.
PLoS One ; 15(11): e0242375, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33211740

RESUMEN

Vasoplegia observed post cardiopulmonary bypass (CPB) is associated with substantial morbidity, multiple organ failure and mortality. Circulating counts of hematopoietic stem cells (HSCs) and endothelial progenitor cells (EPC) are potential markers of neo-vascularization and vascular repair. However, the significance of changes in the circulating levels of these progenitors in perioperative CPB, and their association with post-CPB vasoplegia, are currently unexplored. We enumerated HSC and EPC counts, via flow cytometry, at different time-points during CPB in 19 individuals who underwent elective cardiac surgery. These 19 individuals were categorized into two groups based on severity of post-operative vasoplegia, a clinically insignificant vasoplegic Group 1 (G1) and a clinically significant vasoplegic Group 2 (G2). Differential changes in progenitor cell counts during different stages of surgery were compared across these two groups. Machine-learning classifiers (logistic regression and gradient boosting) were employed to determine if differential changes in progenitor counts could aid the classification of individuals into these groups. Enumerating progenitor cells revealed an early and significant increase in the circulating counts of CD34+ and CD34+CD133+ hematopoietic stem cells (HSC) in G1 individuals, while these counts were attenuated in G2 individuals. Additionally, EPCs (CD34+VEGFR2+) were lower in G2 individuals compared to G1. Gradient boosting outperformed logistic regression in assessing the vasoplegia grouping based on the fold change in circulating CD 34+ levels. Our findings indicate that a lack of early response of CD34+ cells and CD34+CD133+ HSCs might serve as an early marker for development of clinically significant vasoplegia after CPB.


Asunto(s)
Recuento de Células Sanguíneas , Puente Cardiopulmonar/efectos adversos , Células Progenitoras Endoteliales , Células Madre Hematopoyéticas , Vasoplejía/sangre , Antagonistas Adrenérgicos beta/uso terapéutico , Adulto , Anciano , Bloqueadores del Receptor Tipo 1 de Angiotensina II/uso terapéutico , Inhibidores de la Enzima Convertidora de Angiotensina/uso terapéutico , Antropometría , Comorbilidad , Procedimientos Quirúrgicos Electivos , Femenino , Humanos , Inhibidores de Hidroximetilglutaril-CoA Reductasas/uso terapéutico , Periodo Intraoperatorio , Cinética , Aprendizaje Automático , Masculino , Persona de Mediana Edad , Proyectos Piloto , Periodo Posoperatorio , Índice de Severidad de la Enfermedad , Vasoplejía/fisiopatología
17.
Elife ; 92020 01 20.
Artículo en Inglés | MEDLINE | ID: mdl-31958060

RESUMEN

Genomic rearrangements associated with speciation often result in variation in chromosome number among closely related species. Malassezia species show variable karyotypes ranging between six and nine chromosomes. Here, we experimentally identified all eight centromeres in M. sympodialis as 3-5-kb long kinetochore-bound regions that span an AT-rich core and are depleted of the canonical histone H3. Centromeres of similar sequence features were identified as CENP-A-rich regions in Malassezia furfur, which has seven chromosomes, and histone H3 depleted regions in Malassezia slooffiae and Malassezia globosa with nine chromosomes each. Analysis of synteny conservation across centromeres with newly generated chromosome-level genome assemblies suggests two distinct mechanisms of chromosome number reduction from an inferred nine-chromosome ancestral state: (a) chromosome breakage followed by loss of centromere DNA and (b) centromere inactivation accompanied by changes in DNA sequence following chromosome-chromosome fusion. We propose that AT-rich centromeres drive karyotype diversity in the Malassezia species complex through breakage and inactivation.


Millions of yeast, bacteria and other microbes live in or on the human body. A type of yeast known as Malassezia is one of the most abundantmicrobes living on our skin. Generally, Malassezia do not cause symptoms in humans but are associated with dandruff, dermatitis and other skin conditions in susceptible individuals. They have also been found in the human gut, where they exacerbate Crohn's disease and pancreatic cancer. There are 18 closely related species of Malassezia and all have an unusually small amount of genetic material compared with other types of yeast. In yeast, like in humans, the genetic material is divided among several chromosomes. The number of chromosomes in different Malassezia species varies between six and nine. A region of each chromosome known as the centromere is responsible for ensuring that the equal numbers of chromosomes are passed on to their offspring. This means that any defects in centromeres can lead to the daughter yeast cells inheriting unequal numbers of chromosomes. Changes in chromosome number can drive the evolution of new species, but it remains unclear if and how centromere loss may have contributed to the evolution of Malassezia species. Sankaranarayanan et al. have now used biochemical, molecular genetic, and comparative genomic approaches to study the chromosomes of Malassezia species. The experiments revealed that nine Malassezia species had centromeres that shared common features such as being rich in adenine and thymine nucleotides, two of the building blocks of DNA. Sankaranarayanan et al. propose that these adenines and thymines make the centromeres more fragile leading to occasional breaks. This may have contributed to the loss of centromeres in some Malassezia cells and helped new species to evolve with fewer chromosomes. A better understanding of how Malassezia organize their genetic material should enable in-depth studies of how these yeasts interact with their human hosts and how they contribute to skin disease, cancer, Crohn's disease and other health conditions. More broadly, these findings may help scientists to better understand how changes in chromosomes cause new species to evolve.


Asunto(s)
Centrómero , Evolución Molecular , Cariotipificación , Malassezia/fisiología , Cromosomas Fúngicos , Malassezia/clasificación , Malassezia/genética , Especificidad de la Especie
18.
PLoS Comput Biol ; 4(8): e1000156, 2008 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-18769735

RESUMEN

PhyloGibbs, our recent Gibbs-sampling motif-finder, takes phylogeny into account in detecting binding sites for transcription factors in DNA and assigns posterior probabilities to its predictions obtained by sampling the entire configuration space. Here, in an extension called PhyloGibbs-MP, we widen the scope of the program, addressing two major problems in computational regulatory genomics. First, PhyloGibbs-MP can localise predictions to small, undetermined regions of a large input sequence, thus effectively predicting cis-regulatory modules (CRMs) ab initio while simultaneously predicting binding sites in those modules-tasks that are usually done by two separate programs. PhyloGibbs-MP's performance at such ab initio CRM prediction is comparable with or superior to dedicated module-prediction software that use prior knowledge of previously characterised transcription factors. Second, PhyloGibbs-MP can predict motifs that differentiate between two (or more) different groups of regulatory regions, that is, motifs that occur preferentially in one group over the others. While other "discriminative motif-finders" have been published in the literature, PhyloGibbs-MP's implementation has some unique features and flexibility. Benchmarks on synthetic and actual genomic data show that this algorithm is successful at enhancing predictions of differentiating sites and suppressing predictions of common sites and compares with or outperforms other discriminative motif-finders on actual genomic data. Additional enhancements include significant performance and speed improvements, the ability to use "informative priors" on known transcription factors, and the ability to output annotations in a format that can be visualised with the Generic Genome Browser. In stand-alone motif-finding, PhyloGibbs-MP remains competitive, outperforming PhyloGibbs-1.0 and other programs on benchmark data.


Asunto(s)
Secuencia de Consenso/genética , ADN/análisis , ADN/ultraestructura , Conformación de Ácido Nucleico , Elementos Reguladores de la Transcripción , Programas Informáticos , Animales , Sitios de Unión/genética , ADN/química , ADN/genética , Proteínas de Unión al ADN/química , Drosophila melanogaster/genética , Regulación de la Expresión Génica , Genómica/métodos , Filogenia , Saccharomyces cerevisiae/genética , Homología de Secuencia de Ácido Nucleico , Relación Estructura-Actividad , Factores de Transcripción/química
19.
PLoS One ; 13(7): e0199771, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30016330

RESUMEN

Transcription factors (TFs) often work cooperatively, where the binding of one TF to DNA enhances the binding affinity of a second TF to a nearby location. Such cooperative binding is important for activating gene expression from promoters and enhancers in both prokaryotic and eukaryotic cells. Existing methods to detect cooperative binding of a TF pair rely on analyzing the sequence that is bound. We propose a method that uses, instead, only ChIP-seq peak intensities and an expectation maximization (CPI-EM) algorithm. We validate our method using ChIP-seq data from cells where one of a pair of TFs under consideration has been genetically knocked out. Our algorithm relies on our observation that cooperative TF-TF binding is correlated with weak binding of one of the TFs, which we demonstrate in a variety of cell types, including E. coli, S. cerevisiae and M. musculus cells. We show that this method performs significantly better than a predictor based only on the ChIP-seq peak distance of the TFs under consideration. This suggests that peak intensities contain information that can help detect the cooperative binding of a TF pair. CPI-EM also outperforms an existing sequence-based algorithm in detecting cooperative binding. The CPI-EM algorithm is available at https://github.com/vishakad/cpi-em.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Mapeo de Interacción de Proteínas/métodos , Programas Informáticos , Factores de Transcripción/metabolismo , Animales , Escherichia coli , Ratones , Unión Proteica , Saccharomyces cerevisiae
20.
Genetics ; 172(4): 2113-22, 2006 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-16415362

RESUMEN

Genomewide techniques to assay gene expression and transcription factor binding are in widespread use, but are far from providing predictive rules for the function of regulatory DNA. To investigate more intensively the grammar rules for active regulatory sequence, we made libraries from random ligations of a very restricted set of sequences. Working with the yeast Saccharomyces cerevisiae, we developed a novel screen based on the sensitivity of ascospores lacking dityrosine to treatment with lytic enzymes. We tested two separate libraries built by random ligation of a single type of activator site either for a well-characterized sporulation factor, Ndt80, or for a new sporulation-specific regulatory site that we identified and several neutral spacer elements. This selective system achieved up to 1:10(4) enrichment of the artificial sequences that were active during sporulation, allowing a high-throughput analysis of large libraries of synthetic promoters. This is not practical with methods involving direct screening for expression, such as those based on fluorescent reporters. There were very few false positives, since active promoters always passed the screen when retested. The survival rate of our libraries containing roughly equal numbers of spacers and activators was a few percent that of libraries made from activators alone. The sequences of approximately 100 examples of active and inactive promoters could not be distinguished by simple binary rules; instead, the best model for the data was a linear regression fit of a quantitative measure of gene activity to multiple features of the regulatory sequence.


Asunto(s)
Biblioteca de Genes , Regiones Promotoras Genéticas , Saccharomyces cerevisiae/genética , Éter/farmacología , Colorantes Fluorescentes/farmacología , Proteínas Fúngicas/química , Genes Reporteros , Técnicas Genéticas , Vectores Genéticos , Proteínas Fluorescentes Verdes/química , Modelos Genéticos , Análisis de Regresión , Tirosina/análogos & derivados , Tirosina/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA