Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
1.
Methods Mol Biol ; 2745: 233-253, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38060190

RESUMO

In essence, the COVID-19 pandemic can be regarded as a systems biology problem, with the entire world as the system, and the human population as the element transitioning from one state to another with certain transition rates. While capturing all the relevant features of such a complex system is hardly possible, compartmental epidemiological models can be used as an appropriate simplification to model the system's dynamics and infer its important characteristics, such as basic and effective reproductive numbers of the virus. These measures can later be used as response variables in feature selection methods to uncover the main factors contributing to disease transmissibility. We here demonstrate that a combination of dynamic modeling and machine learning approaches can represent a powerful tool in understanding the spread, not only of COVID-19, but of any infectious disease of epidemiological proportions.


Assuntos
COVID-19 , Vírus , Humanos , COVID-19/epidemiologia , SARS-CoV-2 , Pandemias , Biologia de Sistemas
2.
Front Big Data ; 6: 1038283, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37034433

RESUMO

Understanding sociodemographic factors behind COVID-19 severity relates to significant methodological difficulties, such as differences in testing policies and epidemics phase, as well as a large number of predictors that can potentially contribute to severity. To account for these difficulties, we assemble 115 predictors for more than 3,000 US counties and employ a well-defined COVID-19 severity measure derived from epidemiological dynamics modeling. We then use a number of advanced feature selection techniques from machine learning to determine which of these predictors significantly impact the disease severity. We obtain a surprisingly simple result, where only two variables are clearly and robustly selected-population density and proportion of African Americans. Possible causes behind this result are discussed. We argue that the approach may be useful whenever significant determinants of disease progression over diverse geographic regions should be selected from a large number of potentially important factors.

3.
J Antimicrob Chemother ; 78(4): 1066-1075, 2023 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-36857516

RESUMO

BACKGROUND: Bacterial toxin-antitoxin (TA) modules respond to various stressful conditions. The Gcn5-related N-acetyltransferase-type toxin (GNAT) protein encoded by the GNAT-RHH TA locus is involved in the antibiotic tolerance of Klebsiella pneumoniae. OBJECTIVES: To investigate the transcriptional mechanism of the GNAT-RHH operon kacAT under antibiotic stress. METHODS: The transcriptional level of the kacAT operon of K. pneumoniae was measured by quantitative real-time (qRT) PCR assay. The degradation of antitoxin KacA was examined by western blot and fluorescent protein. The ratio of [KacA]:[KacT] was calculated by the fluorescence intensity of KacA-eGFP and mCherry-KacT. Mathematical modelling predicted protein and transcript synthesis dynamics. RESULTS: A meropenem-induced increase in transcript levels of kacA and kacT resulted from the relief from transcriptional autoregulation of the kacAT operon. Meropenem induces the degradation of KacA through Lon protease, resulting in a reduction in the ratio of [KacA]:[KacT]. The decreased ratio causes the dissociation of the KacAT complex from its promoter region, which eliminates the repression of kacAT transcription. In addition, our dynamic model of kacAT expression regulation quantitatively reproduced the experimentally observed reduction of the [KacA]:[KacT] ratio and a large increase in kacAT transcript levels under the condition of strong promoter autorepression by the KacAT complex. CONCLUSIONS: Meropenem promotes the degradation of antitoxin by enhancing the expression of Lon protease. Degradation of antitoxin reduces the ratio of intracellular [antitoxin]:[toxin], leading to detachment of the TA complex from its promoter, and releasing repression of TA operon transcription. These results may provide an important insight into the transcriptional mechanism of GNAT-RHH TA modules under antibiotic stress.


Assuntos
Antitoxinas , Protease La , Antitoxinas/genética , Meropeném , Acetiltransferases , Protease La/metabolismo , Óperon , Antibacterianos/farmacologia , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Regulação Bacteriana da Expressão Gênica
4.
Sci China Life Sci ; 66(3): 626-634, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36346548

RESUMO

Type VI Secretion System (T6SS) plays significant roles in microbial activities via injecting effectors into adjacent cells or environments. T6SS increasingly gained attention due to its important influence on pathogenesis, microbial competition, etc. T6SS-associated research is explosively expanding on numerous grounds that call for an efficient resource. The SecReT6 version 3 provides comprehensive information on T6SS and the interactions between T6SS and T6SS-related proteins such as T6SS regulators and T6SS effectors. To assist T6SS researches like microbial competition and regulatory mechanisms, SecReT6 v3 developed online tools for detection and analysis of T6SS and T6SS-related proteins and estimation of T6SS-dependent killing risk. We have identified a novel T6SS regulator and T6SS-dependent killing capacity in Acinetobacter baumannii clinical isolates with the aid of SecReT6 v3. 17,212 T6SSs and plentiful T6SS-related proteins in 26,573 bacterial complete genomes were also detected, analyzed and incorporated into the database. The database is freely available at https://bioinfo-mml.sjtu.edu.cn/SecReT6/ .


Assuntos
Acinetobacter baumannii , Sistemas de Secreção Tipo VI , Sistemas de Secreção Tipo VI/genética , Sistemas de Secreção Tipo VI/metabolismo , Acinetobacter baumannii/genética , Acinetobacter baumannii/metabolismo , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Genoma Bacteriano
5.
Environ Res ; 216(Pt 1): 114446, 2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36208783

RESUMO

The emergence of a new virus variant is generally recognized by its usually sudden and rapid spread (outburst) in a certain world region. Due to the near-exponential rate of initial expansion, the new strain may not be detected at its true geographical origin but in the area with the most favorable conditions leading to the fastest exponential growth. Therefore, it is crucial to understand better the factors that promote such outbursts, which we address in the example of analyzing global Omicron transmissibility during its global emergence/outburst in November 2021-February 2022. As predictors, we assemble a number of potentially relevant factors: vaccinations (both full and boosters), different measures of population mobility (provided by Google), estimated stringency of measures, the prevalence of chronic diseases, population age, the timing of the outburst, and several other socio-demographic variables. As a proxy for natural immunity (prevalence of prior infections in population), we use cumulative numbers of COVID-19 deaths. As a response variable (transmissibility measure), we use the estimated effective reproduction number (Re) averaged in the vicinity of the outburst maxima. To select significant predictors of Re, we use machine learning regressions that employ feature selection, including methods based on ensembles of decision trees (Random Forest and Gradient Boosting). We identify the young population, earlier infection onset, higher mobility, low natural immunity, and low booster prevalence as likely direct risk factors. Interestingly, we find that all these risk factors were significantly higher for Africa, though curiously somewhat lower in Southern African countries (where the outburst emerged) compared to other African countries. Therefore, while the risk factors related to the virus transmissibility clearly promote the outburst of a new virus variant, specific regions/countries where the outburst actually happens may be related to less evident factors, possibly random in nature.


Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , Fatores de Risco , Número Básico de Reprodução , Prevalência , Geografia
6.
Sci Rep ; 12(1): 17711, 2022 10 21.
Artigo em Inglês | MEDLINE | ID: mdl-36271249

RESUMO

Global Health Security Index (GHSI) categories are formulated to assess the capacity of world countries to deal with infectious disease risks. Thus, higher values of these indices were expected to translate to lower COVID-19 severity. However, it turned out to be the opposite, surprisingly suggesting that higher estimated country preparedness to epidemics may lead to higher disease mortality. To address this puzzle, we: (i) use a model-derived measure of COVID-19 severity; (ii) employ a range of statistical learning approaches, including non-parametric machine learning methods; (iii) consider the overall excess mortality, in addition to official COVID-19 fatality counts. Our results suggest that the puzzle is, to a large extent, an artifact of oversimplified data analysis and a consequence of misclassified COVID-19 deaths, combined with the higher median age of the population and earlier epidemics onset in countries with high GHSI scores.


Assuntos
COVID-19 , Epidemias , Humanos , COVID-19/epidemiologia , Saúde Global , Países Desenvolvidos
7.
Microbiol Spectr ; 10(4): e0032022, 2022 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-35703555

RESUMO

Toxin-antitoxin (TA) modules containing a Gcn5-related N-acetyltransferase (GNAT) toxin domain regulate bacterial physiology under adverse environmental stresses. Multiple GNAT-ribbon-helix-helix domain (RHH) TA loci have been identified in single bacterial genomes. However, their diversity and interactions are still obscure. Our previous analysis showed that the GNAT toxin of Klebsiella pneumoniae, KacT, introduces antibiotic tolerance and the toxicity of GNAT is neutralized by KacA, an RHH antitoxin. We here present a phylogenetic analysis of GNAT toxins of more than 1,000 GNAT-RHH pairs detected in completely sequenced K. pneumoniae genomes, revealing that the GNAT toxins are diverse and grouped into four distinct clades. Overexpression of GNAT toxins representative of each of the four clades halts the cell growth of K. pneumoniae, while the coexpression of the cognate RHH antitoxin neutralizes GNAT toxicity. We also identify point mutations that inactivate the GNAT toxins. Moreover, we observe a cross-interaction between GNAT-RHH pairs encoded by different replicons, where a chromosomal toxin (KacT2) can be neutralized by its cognate RHH antitoxin (KacA2 on a chromosome) and another antitoxin (KacA3 on a plasmid). Finally, statistical analysis of the distribution of GNAT-RHH loci in K. pneumoniae strains shows pronounced deviation from random distribution within the same clades. Moreover, we also obtain statistically significant correlations between different clades, which we discuss in terms of the experimental results. IMPORTANCE Elucidating the roles of multifaceted GNAT-RHH TA loci is essential for understanding how these TAs interact among themselves. Recently, the reaction mechanisms and structures of several GNAT-RHH pairs have been reported. While bacterial strains can carry multiple GNAT-RHH loci with diverse origins, studies on the possible cross-interactions of these TA pairs are still limited. Here, we find that 1,000 predicted GNAT toxins of K. pneumoniae can be grouped into four distinct clades. The distributions of TA loci within these clades in K. pneumoniae strains are highly nonrandom, with the presence of a single locus of each clade per strain being highly overrepresented. Moreover, the toxicity of a GNAT toxin encoded by a chromosome was alleviated by a noncognate RHH antitoxin on a plasmid. These results might yield a profound understanding of the widespread GNAT-RHH TA pairs and the cross-interactions between noncognate TA pairs located on different replicons.


Assuntos
Antitoxinas , Toxinas Bacterianas , Acetiltransferases/genética , Antitoxinas/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Toxinas Bacterianas/química , Toxinas Bacterianas/genética , Klebsiella pneumoniae/genética , Filogenia
8.
One Health ; 13: 100355, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34869819

RESUMO

Understanding variations in the severity of infectious diseases is essential for planning proper mitigation strategies. Determinants of COVID-19 clinical severity are commonly assessed by transverse or longitudinal studies of the fatality counts. However, the fatality counts depend both on disease clinical severity and transmissibility, as more infected also lead to more deaths. Instead, we use epidemiological modeling to propose a disease severity measure that accounts for the underlying disease dynamics. The measure corresponds to the ratio of population-averaged mortality and recovery rates (m/r), is independent of the disease transmission dynamics (i.e., the basic reproduction number), and has a direct mechanistic interpretation. We use this measure to assess demographic, medical, meteorological, and environmental factors associated with the disease severity. For this, we employ an ecological regression study design and analyze different US states during the first disease outbreak. Principal Component Analysis, followed by univariate, and multivariate analyses based on machine learning techniques, is used for selecting important predictors. The usefulness of the introduced severity measure and the validity of the approach are confirmed by the fact that, without using prior knowledge from clinical studies, we recover the main significant predictors known to influence disease severity, in particular age, chronic diseases, and racial factors. Additionally, we identify long-term pollution exposure and population density as not widely recognized (though for the pollution previously hypothesized) significant predictors. The proposed measure is applicable for inferring severity determinants not only of COVID-19 but also of other infectious diseases, and the obtained results may aid a better understanding of the present and future epidemics. Our holistic, systematic investigation of disease severity at the human-environment intersection by epidemiological dynamical modeling and machine learning ecological regressions is aligned with the One Health approach. The obtained results emphasize a syndemic nature of COVID-19 risks.

9.
Geohealth ; 5(9): e2021GH000432, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34568708

RESUMO

Identifying the main environmental drivers of SARS-CoV-2 transmissibility in the population is crucial for understanding current and potential future outbursts of COVID-19 and other infectious diseases. To address this problem, we concentrate on the basic reproduction number R 0, which is not sensitive to testing coverage and represents transmissibility in an absence of social distancing and in a completely susceptible population. While many variables may potentially influence R 0, a high correlation between these variables may obscure the result interpretation. Consequently, we combine Principal Component Analysis with feature selection methods from several regression-based approaches to identify the main demographic and meteorological drivers behind R 0. We robustly obtain that country's wealth/development (GDP per capita or Human Development Index) is the most important R 0 predictor at the global level, probably being a good proxy for the overall contact frequency in a population. This main effect is modulated by built-up area per capita (crowdedness in indoor space), onset of infection (likely related to increased awareness of infection risks), net migration, unhealthy living lifestyle/conditions including pollution, seasonality, and possibly BCG vaccination prevalence. Also, we argue that several variables that significantly correlate with transmissibility do not directly influence R 0 or affect it differently than suggested by naïve analysis.

10.
Adv Protein Chem Struct Biol ; 127: 291-314, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34340771

RESUMO

A number of models in mathematical epidemiology have been developed to account for control measures such as vaccination or quarantine. However, COVID-19 has brought unprecedented social distancing measures, with a challenge on how to include these in a manner that can explain the data but avoid overfitting in parameter inference. We here develop a simple time-dependent model, where social distancing effects are introduced analogous to coarse-grained models of gene expression control in systems biology. We apply our approach to understand drastic differences in COVID-19 infection and fatality counts, observed between Hubei (Wuhan) and other Mainland China provinces. We find that these unintuitive data may be explained through an interplay of differences in transmissibility, effective protection, and detection efficiencies between Hubei and other provinces. More generally, our results demonstrate that regional differences may drastically shape infection outbursts. The obtained results demonstrate the applicability of our developed method to extract key infection parameters directly from publically available data so that it can be globally applied to outbreaks of COVID-19 in a number of countries. Overall, we show that applications of uncommon strategies, such as methods and approaches from molecular systems biology research to mathematical epidemiology, may significantly advance our understanding of COVID-19 and other infectious diseases.


Assuntos
COVID-19/mortalidade , COVID-19/transmissão , Simulação por Computador , Modelos Biológicos , SARS-CoV-2 , China/epidemiologia , Feminino , Humanos , Masculino
11.
Environ Res ; 201: 111526, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34174258

RESUMO

Many studies have proposed a relationship between COVID-19 transmissibility and ambient pollution levels. However, a major limitation in establishing such associations is to adequately account for complex disease dynamics, influenced by e.g. significant differences in control measures and testing policies. Another difficulty is appropriately controlling the effects of other potentially important factors, due to both their mutual correlations and a limited dataset. To overcome these difficulties, we will here use the basic reproduction number (R0) that we estimate for USA states using non-linear dynamics methods. To account for a large number of predictors (many of which are mutually strongly correlated), combined with a limited dataset, we employ machine-learning methods. Specifically, to reduce dimensionality without complicating the variable interpretation, we employ Principal Component Analysis on subsets of mutually related (and correlated) predictors. Methods that allow feature (predictor) selection, and ranking their importance, are then used, including both linear regressions with regularization and feature selection (Lasso and Elastic Net) and non-parametric methods based on ensembles of weak-learners (Random Forest and Gradient Boost). Through these substantially different approaches, we robustly obtain that PM2.5 is a major predictor of R0 in USA states, with corrections from factors such as other pollutants, prosperity measures, population density, chronic disease levels, and possibly racial composition. As a rough magnitude estimate, we obtain that a relative change in R0, with variations in pollution levels observed in the USA, is typically ~30%, which further underscores the importance of pollution in COVID-19 transmissibility.


Assuntos
Poluentes Atmosféricos , COVID-19 , Poluentes Atmosféricos/análise , Número Básico de Reprodução , Humanos , Material Particulado/análise , SARS-CoV-2 , Estados Unidos
12.
Glob Chall ; 5(5): 2000101, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-33786198

RESUMO

Widespread growth signatures in COVID-19 confirmed case counts are reported, with sharp transitions between three distinct dynamical regimes (exponential, superlinear, and sublinear). Through analytical and numerical analysis, a novel framework is developed that exploits information in these signatures. An approach well known to physics is applied, where one looks for common dynamical features, independently from differences in other factors. These features and associated scaling laws are used as a powerful tool to pinpoint regions where analytical derivations are effective, get an insight into qualitative changes of the disease progression, and infer the key infection parameters. The developed framework for joint analytical and numerical analysis of empirically observed COVID-19 growth patterns can lead to a fundamental understanding of infection progression under strong control measures, applicable to outbursts of both COVID-19 and other infectious diseases.

13.
Front Microbiol ; 10: 2054, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31551987

RESUMO

Inferring transcriptional direction (orientation) of the CRISPR array is essential for many applications, including systematically investigating non-canonical CRISPR/Cas functions. The standard method, CRISPRDirection (embedded within CRISPRCasFinder), fails to predict the orientation (ND predictions) for ∼37% of the classified CRISPR arrays (>2200 loci); this goes up to >70% for the II-B subtype where non-canonical functions were first experimentally discovered. Alternatively, Potential Orientation (also embedded within CRISPRCasFinder), has a much smaller frequency of ND predictions but might have significantly lower accuracy. We propose a novel simple criterion, where the CRISPR array direction is assigned according to the direction of its associated cas genes (Cas Orientation). We systematically assess the performance of the three methods (Cas Orientation, CRISPRDirection, and Potential Orientation) across all CRISPR/Cas subtypes, by a mutual crosscheck of their predictions, and by comparing them to the experimental dataset. Interestingly, CRISPRDirection agrees much better with Cas Orientation than with Potential Orientation, despite CRISPRDirection and Potential Orientation being mutually related - Potential Orientation corresponding to one of six (heterogeneous) predictors employed by CRISPRDirection - and being unrelated to Cas Orientation. We find that Cas Orientation has much higher accuracy compared to Potential Orientation and comparable accuracy to CRISPRDirection - while accurately assigning an orientation to ∼95% of the CRISPR arrays that are non-determined by CRISPRDirection. Cas Orientation is, at the same time, simple to employ, requiring only (routine for prokaryotes) the prediction of the associated protein coding gene direction.

14.
Eur Biophys J ; 48(5): 413-424, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-30972433

RESUMO

Recent decades brought a revolution to biology, driven mainly by exponentially increasing amounts of data coming from "'omics" sciences. To handle these data, bioinformatics often has to combine biologically heterogeneous signals, for which methods from statistics and engineering (e.g. machine learning) are often used. While such an approach is sometimes necessary, it effectively treats the underlying biological processes as a black box. Similarly, systems biology deals with inherently complex systems, characterized by a large number of degrees of freedom, and interactions that are highly non-linear. To deal with this complexity, the underlying physical interactions are often (over)simplified, such as in Boolean modelling of network dynamics. In this review, we argue for the utility of applying a biophysical approach in bioinformatics and systems biology, including discussion of two examples from our research which address sequence analysis and understanding intracellular gene expression dynamics.


Assuntos
Biofísica/métodos , Proteômica/métodos , Biologia de Sistemas/métodos , Regulação da Expressão Gênica , Análise de Sequência de DNA
15.
Molecules ; 24(4)2019 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-30795631

RESUMO

CRISPR/Cas is an adaptive bacterial immune system, whose CRISPR array can actively change in response to viral infections. However, Type I-E CRISPR/Cas in E. coli (an established model system), appears not to exhibit such active adaptation, which suggests that it might have functions other than immune response. Through computational analysis, we address the involvement of the system in non-canonical functions. To assess targets of CRISPR spacers, we align them against both E. coli genome and an exhaustive (~230) set of E. coli viruses. We systematically investigate the obtained alignments, such as hit distribution with respect to genome annotation, propensity to target mRNA, the target functional enrichment, conservation of CRISPR spacers and putative targets in related bacterial genomes. We find that CRISPR spacers have a statistically highly significant tendency to target i) host compared to phage genomes, ii) one of the two DNA strands, iii) genomic dsDNA rather than mRNA, iv) transcriptionally active regions, and v) sequences (cis-regulatory elements) with slower turn-over rate compared to CRISPR spacers (trans-factors). The results suggest that the Type I-E CRISPR/Cas system has a major role in transcription regulation of endogenous genes, with a potential to rapidly rewire these regulatory interactions, with targets being selected through naïve adaptation.


Assuntos
Sistemas CRISPR-Cas , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Colífagos/genética , Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica , Genoma Bacteriano , Genoma Viral , Sequência de Bases , Biologia Computacional , DNA/genética , DNA/metabolismo , DNA Bacteriano/genética , DNA Bacteriano/metabolismo , Escherichia coli/metabolismo , Escherichia coli/virologia , Loci Gênicos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Alinhamento de Sequência , Transcrição Gênica
16.
Molecules ; 24(1)2019 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-30621083

RESUMO

In vivo dynamics of protein levels in bacterial cells depend on both intracellular regulation and relevant population dynamics. Such population dynamics effects, e.g., interplay between cell and plasmid division rates, are, however, often neglected in modeling gene expression regulation. Including them in a model introduces additional parameters shared by the dynamical equations, which can significantly increase dimensionality of the parameter inference. We here analyse the importance of these effects, on a case of bacterial restriction-modification (R-M) system. We redevelop our earlier minimal model of this system gene expression regulation, based on a thermodynamic and dynamic system modeling framework, to include the population dynamics effects. To resolve the problem of effective coupling of the dynamical equations, we propose a "mean-field-like" procedure, which allows determining only part of the parameters at a time, by separately fitting them to expression dynamics data of individual molecular species. We show that including the interplay between kinetics of cell division and plasmid replication is necessary to explain the experimental measurements. Moreover, neglecting population dynamics effects can lead to falsely identifying non-existent regulatory mechanisms. Our results call for advanced methods to reverse-engineer intracellular regulation from dynamical data, which would also take into account the population dynamics effects.


Assuntos
Bactérias/genética , Divisão Celular/genética , Plasmídeos/genética , Dinâmica Populacional , Bactérias/química , Replicação do DNA/genética , Regulação da Expressão Gênica , Cinética , Modelos Biológicos , Termodinâmica
17.
mBio ; 9(6)2018 12 04.
Artigo em Inglês | MEDLINE | ID: mdl-30514784

RESUMO

CRISPR DNA arrays of unique spacers separated by identical repeats ensure prokaryotic immunity through specific targeting of foreign nucleic acids complementary to spacers. New spacers are acquired into a CRISPR array in a process of CRISPR adaptation. Selection of foreign DNA fragments to be integrated into CRISPR arrays relies on PAM (protospacer adjacent motif) recognition, as only those spacers will be functional against invaders. However, acquisition of different PAM-associated spacers proceeds with markedly different efficiency from the same DNA. Here, we used a combination of bioinformatics and experimental approaches to understand factors affecting the efficiency of acquisition of spacers by the Escherichia coli type I-E CRISPR-Cas system, for which two modes of CRISPR adaptation have been described: naive and primed. We found that during primed adaptation, efficiency of spacer acquisition is strongly negatively affected by the presence of an AAG trinucleotide-a consensus PAM-within the sequence being selected. No such trend is observed during naive adaptation. The results are consistent with a unidirectional spacer selection process during primed adaptation and provide a specific signature for identification of spacers acquired through primed adaptation in natural populations.IMPORTANCE Adaptive immunity of prokaryotes depends on acquisition of foreign DNA fragments into CRISPR arrays as spacers followed by destruction of foreign DNA by CRISPR interference machinery. Different fragments are acquired into CRISPR arrays with widely different efficiencies, but the factors responsible are not known. We analyzed the frequency of spacers acquired during primed adaptation in an E. coli CRISPR array and found that AAG motif was depleted from highly acquired spacers. AAG is also a consensus protospacer adjacent motif (PAM) that must be present upstream from the target of the CRISPR spacer for its efficient destruction by the interference machinery. These results are important because they provide new information on the mechanism of primed spacer acquisition. They add to other previous evidence in the field that pointed out to a "directionality" in the capture of new spacers. Our data strongly suggest that the recognition of an AAG PAM by the interference machinery components prior to spacer capture occludes downstream AAG sequences, thus preventing their recognition by the adaptation machinery.


Assuntos
Sistemas CRISPR-Cas , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , DNA Intergênico/genética , Escherichia coli/genética , DNA Bacteriano/genética
18.
Front Genet ; 9: 474, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30386377

RESUMO

In addition to its well-established defense function, CRISPR/Cas can also exhibit crucial non-canonical activity through endogenous gene expression regulation, which was found to mainly affect bacterial virulence. These non-canonical functions depend on scaRNA, which is a small RNA encoded outside of CRISPR array, that is typically flanked by a transcription start site (TSS) and a terminator, and is in part complementary to another small CRISPR/Cas-associated RNA (tracrRNAs). Identification of scaRNAs is however largely complicated by the scarcity of RNA-Seq data across different bacteria, so that they were identified only in a relatively rare CRISPR/Cas subtype (IIB), and the possibility of finding them in other Type II systems is currently unclear. This study presents the first effort toward systematic detection of small CRISPR/Cas-associated regulatory RNAs, where obtained predictions can guide future experiments. The core of our approach is ab initio detection of small RNAs from bacterial genome, which is based on jointly predicting transcription signals - TSS and terminators - and homology to CRISPR array repeat. Particularly, we employ our improved approach for detecting bacterial TSS, since accurate TSS detection is the main limiting factor for accurate small RNA prediction. We also explore how our predictions match to available RNA-Seq data and analyze their conservation across related bacterial species. In Type IIB systems, our predictions are consistent with experimental data, and we systematically identify scaRNAs throughout this subtype. Furthermore, we identify scaRNA:tracrRNA pairs in a number of IIA/IIC systems, where the appearance of scaRNAs co-occurs with the strains being pathogenic. RNA-Seq and conservation analysis show that our method is well suited for predicting CRISPR/Cas-associated small RNAs. We also find possible existence of a modified mechanism of CRISPR-associated small RNA action, which, interestingly, closely resembles the setup employed in biotechnological applications. Overall, our findings indicate that scaRNA:tracrRNA pairs are present in all subtypes of Type II systems, and point to an underlying connection with bacterial virulence. In addition to formulating these hypotheses, careful manual curation that we performed, makes an important first step toward fully automated predictor of CRISPR/Cas-associated small RNAs, which will allow their large scale analysis across diverse bacterial genomes.

19.
Nucleic Acids Res ; 46(20): 10810-10826, 2018 11 16.
Artigo em Inglês | MEDLINE | ID: mdl-30295835

RESUMO

C-proteins control restriction-modification (R-M) systems' genes transcription to ensure sufficient levels of restriction endonuclease to allow protection from foreign DNA while avoiding its modification by excess methyltransferase. Here, we characterize transcription regulation in C-protein dependent R-M system Kpn2I. The Kpn2I restriction endonuclease gene is transcribed from a constitutive, weak promoter, which, atypically, is C-protein independent. Kpn2I C-protein (C.Kpn2I) binds upstream of the strong methyltransferase gene promoter and inhibits it, likely by preventing the interaction of the RNA polymerase sigma subunit with the -35 consensus element. Diminished transcription from the methyltransferase promoter increases transcription from overlapping divergent C-protein gene promoters. All known C-proteins affect transcription initiation from R-M genes promoters. Uniquely, the C.Kpn2I binding site is located within the coding region of its gene. C.Kpn2I acts as a roadblock stalling elongating RNA polymerase and decreasing production of full-length C.Kpn2I mRNA. Mathematical modeling shows that this unusual mode of regulation leads to the same dynamics of accumulation of R-M gene transcripts as observed in systems where C-proteins act at transcription initiation stage only. Bioinformatics analyses suggest that transcription regulation through binding of C.Kpn2I-like proteins within the coding regions of their genes may be widespread.


Assuntos
Proteínas de Bactérias/metabolismo , Endodesoxirribonucleases/metabolismo , Klebsiella pneumoniae/genética , Transcrição Gênica , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Sítios de Ligação , Códon de Iniciação , Biologia Computacional , Desoxirribonuclease I/metabolismo , Endodesoxirribonucleases/genética , Escherichia coli/metabolismo , Funções Verossimilhança , Filogenia , Plasmídeos/metabolismo , Regiões Promotoras Genéticas , Ligação Proteica , Domínios Proteicos , Termodinâmica
20.
Front Microbiol ; 8: 2314, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29213263

RESUMO

Reliable identification of targets of bacterial regulators is necessary to understand bacterial gene expression regulation. These targets are commonly predicted by searching for high-scoring binding sites in the upstream genomic regions, which typically leads to a large number of false positives. In contrast to the common approach, here we propose a novel concept, where overrepresentation of the scoring distribution that corresponds to the entire searched region is assessed, as opposed to predicting individual binding sites. We explore two implementations of this concept, based on Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) tests, which both provide straightforward P-value estimates for predicted targets. This approach is implemented for pleiotropic bacterial regulators, including σ70 (bacterial housekeeping σ factor) target predictions, which is a classical bioinformatics problem characterized by low specificity. We show that KS based approach is both faster and more accurate, departing from the current paradigm of AD being slower, but more accurate. Moreover, KS approach leads to a significant increase in the search accuracy compared to the standard approach, while at the same time straightforwardly assigning well established P-values to each potential target. Consequently, the new KS based method proposed here, which assigns P-values to fixed length upstream regions, provides a fast and accurate approach for predicting bacterial transcription targets.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA