Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 101
Filtrar
1.
Elife ; 122024 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-38293962

RESUMO

Wrapping of DNA into nucleosomes restricts accessibility to DNA and may affect the recognition of binding motifs by transcription factors. A certain class of transcription factors, the pioneer transcription factors, can specifically recognize their DNA binding sites on nucleosomes, initiate local chromatin opening, and facilitate the binding of co-factors in a cell-type-specific manner. For the majority of human pioneer transcription factors, the locations of their binding sites, mechanisms of binding, and regulation remain unknown. We have developed a computational method to predict the cell-type-specific ability of transcription factors to bind nucleosomes by integrating ChIP-seq, MNase-seq, and DNase-seq data with details of nucleosome structure. We have demonstrated the ability of our approach in discriminating pioneer from canonical transcription factors and predicted new potential pioneer transcription factors in H1, K562, HepG2, and HeLa-S3 cell lines. Last, we systematically analyzed the interaction modes between various pioneer transcription factors and detected several clusters of distinctive binding sites on nucleosomal DNA.


Assuntos
Nucleossomos , Fatores de Transcrição , Humanos , Nucleossomos/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Cromatina , DNA/metabolismo , Sítios de Ligação
2.
Genome Biol ; 25(1): 12, 2024 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-38191464

RESUMO

The cost and complexity of generating a complete reference genome means that many organisms lack an annotated reference. An alternative is to use a de novo reference transcriptome. This technology is cost-effective but is susceptible to off-target RNA contamination. In this manuscript, we present GTax, a taxonomy-structured database of genomic sequences that can be used with BLAST to detect and remove foreign contamination in RNA sequencing samples before assembly. In addition, we use a de novo transcriptome assembly of Solanum lycopersicum (tomato) to demonstrate that removing foreign contamination in sequencing samples reduces the number of assembled chimeric transcripts.


Assuntos
Solanum lycopersicum , Transcriptoma , Bases de Dados Factuais , Genômica , RNA , Análise de Sequência de RNA , Solanum lycopersicum/genética
3.
Genome Res ; 33(10): 1662-1672, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37884340

RESUMO

Housekeeping genes are considered to be regulated by common enhancers across different tissues. Here we report that most of the commonly expressed mouse or human genes across different cell types, including more than half of the previously identified housekeeping genes, are associated with cell type-specific enhancers. Furthermore, the binding of most transcription factors (TFs) is cell type-specific. We reason that these cell type specificities are causally related to the collective TF recruitment at regulatory sites, as TFs tend to bind to regions associated with many other TFs and each cell type has a unique repertoire of expressed TFs. Based on binding profiles of hundreds of TFs from HepG2, K562, and GM12878 cells, we show that 80% of all TF peaks overlapping H3K27ac signals are in the top 20,000-23,000 most TF-enriched H3K27ac peak regions, and approximately 12,000-15,000 of these peaks are enhancers (nonpromoters). Those enhancers are mainly cell type-specific and include those linked to the majority of commonly expressed genes. Moreover, we show that the top 15,000 most TF-enriched regulatory sites in HepG2 cells, associated with about 200 TFs, can be predicted largely from the binding profile of as few as 30 TFs. Through motif analysis, we show that major enhancers harbor diverse and clustered motifs from a combination of available TFs uniquely present in each cell type. We propose a mechanism that explains how the highly focused TF binding at regulatory sites results in cell type specificity of enhancers for housekeeping and commonly expressed genes.


Assuntos
Genes Essenciais , Fatores de Transcrição , Humanos , Camundongos , Animais , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Regulação da Expressão Gênica , Sequências Reguladoras de Ácido Nucleico , Ligação Proteica , Sítios de Ligação
4.
bioRxiv ; 2023 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-37425841

RESUMO

Wrapping of DNA into nucleosomes restricts accessibility to the DNA and may affect the recognition of binding motifs by transcription factors. A certain class of transcription factors, the pioneer transcription factors, can specifically recognize their DNA binding sites on nucleosomes, may initiate local chromatin opening and facilitate the binding of co-factors in a cell-type-specific manner. For the majority of human pioneer transcription factors, the locations of their binding sites, mechanisms of binding and regulation remain unknown. We have developed a computational method to predict the cell-type-specific ability of transcription factors to bind nucleosomes by integrating ChIP-seq, MNase-seq and DNase-seq data with details of nucleosome structure. We have demonstrated the ability of our approach in discriminating pioneer from canonical transcription factors and predicted new potential pioneer transcription factors in H1, K562, HepG2 and HeLa cell lines. Lastly, we systemically analyzed the interaction modes between various pioneer transcription factors and detected several clusters of distinctive binding sites on nucleosomal DNA.

5.
Epigenetics Chromatin ; 15(1): 34, 2022 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-36180920

RESUMO

Histones have a long history of research in a wide range of species, leaving a legacy of complex nomenclature in the literature. Community-led discussions at the EMBO Workshop on Histone Variants in 2011 resulted in agreement amongst experts on a revised systematic protein nomenclature for histones, which is based on a combination of phylogenetic classification and historical symbol usage. Human and mouse histone gene symbols previously followed a genome-centric system that was not applicable across all vertebrate species and did not reflect the systematic histone protein nomenclature. This prompted a collaboration between histone experts, the Human Genome Organization (HUGO) Gene Nomenclature Committee (HGNC) and Mouse Genomic Nomenclature Committee (MGNC) to revise human and mouse histone gene nomenclature aiming, where possible, to follow the new protein nomenclature whilst conforming to the guidelines for vertebrate gene naming. The updated nomenclature has also been applied to orthologous histone genes in chimpanzee, rhesus macaque, dog, cat, pig, horse and cattle, and can serve as a framework for naming other vertebrate histone genes in the future.


Assuntos
Genômica , Histonas , Animais , Bovinos , Cães , Genoma , Genômica/métodos , Histonas/genética , Cavalos , Humanos , Macaca mulatta , Mamíferos/genética , Camundongos , Filogenia , Suínos
6.
JMIR Public Health Surveill ; 8(10): e34927, 2022 10 04.
Artigo em Inglês | MEDLINE | ID: mdl-35867901

RESUMO

BACKGROUND: Disproportionate risks of COVID-19 in congregate care facilities including long-term care homes, retirement homes, and shelters both affect and are affected by SARS-CoV-2 infections among facility staff. In cities across Canada, there has been a consistent trend of geographic clustering of COVID-19 cases. However, there is limited information on how COVID-19 among facility staff reflects urban neighborhood disparities, particularly when stratified by the social and structural determinants of community-level transmission. OBJECTIVE: This study aimed to compare the concentration of cumulative cases by geography and social and structural determinants across 3 mutually exclusive subgroups in the Greater Toronto Area (population: 7.1 million): community, facility staff, and health care workers (HCWs) in other settings. METHODS: We conducted a retrospective, observational study using surveillance data on laboratory-confirmed COVID-19 cases (January 23 to December 13, 2020; prior to vaccination rollout). We derived neighborhood-level social and structural determinants from census data and generated Lorenz curves, Gini coefficients, and the Hoover index to visualize and quantify inequalities in cases. RESULTS: The hardest-hit neighborhoods (comprising 20% of the population) accounted for 53.87% (44,937/83,419) of community cases, 48.59% (2356/4849) of facility staff cases, and 42.34% (1669/3942) of other HCW cases. Compared with other HCWs, cases among facility staff reflected the distribution of community cases more closely. Cases among facility staff reflected greater social and structural inequalities (larger Gini coefficients) than those of other HCWs across all determinants. Facility staff cases were also more likely than community cases to be concentrated in lower-income neighborhoods (Gini 0.24, 95% CI 0.15-0.38 vs 0.14, 95% CI 0.08-0.21) with a higher household density (Gini 0.23, 95% CI 0.17-0.29 vs 0.17, 95% CI 0.12-0.22) and with a greater proportion working in other essential services (Gini 0.29, 95% CI 0.21-0.40 vs 0.22, 95% CI 0.17-0.28). CONCLUSIONS: COVID-19 cases among facility staff largely reflect neighborhood-level heterogeneity and disparities, even more so than cases among other HCWs. The findings signal the importance of interventions prioritized and tailored to the home geographies of facility staff in addition to workplace measures, including prioritization and reach of vaccination at home (neighborhood level) and at work.


Assuntos
COVID-19 , COVID-19/epidemiologia , Pessoal de Saúde , Humanos , Características de Residência , Estudos Retrospectivos , SARS-CoV-2
7.
Epigenetics Chromatin ; 15(1): 23, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35761366

RESUMO

BACKGROUND: Nucleosomal binding proteins, HMGN, is a family of chromatin architectural proteins that are expressed in all vertebrate nuclei. Although previous studies have discovered that HMGN proteins have important roles in gene regulation and chromatin accessibility, whether and how HMGN proteins affect higher order chromatin status remains unknown. RESULTS: We examined the roles that HMGN1 and HMGN2 proteins play in higher order chromatin structures in three different cell types. We interrogated data generated in situ, using several techniques, including Hi-C, Promoter Capture Hi-C, ChIP-seq, and ChIP-MS. Our results show that HMGN proteins occupy the A compartment in the 3D nucleus space. In particular, HMGN proteins occupy genomic regions involved in cell-type-specific long-range promoter-enhancer interactions. Interestingly, depletion of HMGN proteins in the three different cell types does not cause structural changes in higher order chromatin, i.e., in topologically associated domains (TADs) and in A/B compartment scores. Using ChIP-seq combined with mass spectrometry, we discovered protein partners that are directly associated with or neighbors of HMGNs on nucleosomes. CONCLUSIONS: We determined how HMGN chromatin architectural proteins are positioned within a 3D nucleus space, including the identification of their binding partners in mononucleosomes. Our research indicates that HMGN proteins localize to active chromatin compartments but do not have major effects on 3D higher order chromatin structure and that their binding to chromatin is not dependent on specific protein partners.


Assuntos
Cromatina , Proteínas HMGN , Epigênese Genética , Proteínas HMGN/química , Proteínas HMGN/genética , Proteínas HMGN/metabolismo , Nucleossomos , Ligação Proteica
8.
Nucleic Acids Res ; 50(4): 1864-1874, 2022 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-35166834

RESUMO

Cytosine methylation at the 5-carbon position is an essential DNA epigenetic mark in many eukaryotic organisms. Although countless structural and functional studies of cytosine methylation have been reported, our understanding of how it influences the nucleosome assembly, structure, and dynamics remains obscure. Here, we investigate the effects of cytosine methylation at CpG sites on nucleosome dynamics and stability. By applying long molecular dynamics simulations on several microsecond time scale, we generate extensive atomistic conformational ensembles of full nucleosomes. Our results reveal that methylation induces pronounced changes in geometry for both linker and nucleosomal DNA, leading to a more curved, under-twisted DNA, narrowing the adjacent minor grooves, and shifting the population equilibrium of sugar-phosphate backbone geometry. These DNA conformational changes are associated with a considerable enhancement of interactions between methylated DNA and the histone octamer, doubling the number of contacts at some key arginines. H2A and H3 tails play important roles in these interactions, especially for DNA methylated nucleosomes. This, in turn, prevents a spontaneous DNA unwrapping of 3-4 helical turns for the methylated nucleosome with truncated histone tails, otherwise observed in the unmethylated system on several microseconds time scale.


Assuntos
Metilação de DNA , Nucleossomos , Sinais (Psicologia) , Citosina , DNA/química , Histonas/metabolismo , Nucleossomos/genética
9.
Ann Epidemiol ; 65: 84-92, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34320380

RESUMO

BACKGROUND: Inequities in the burden of COVID-19 were observed early in Canada and around the world, suggesting economically marginalized communities faced disproportionate risks. However, there has been limited systematic assessment of how heterogeneity in risks has evolved in large urban centers over time. PURPOSE: To address this gap, we quantified the magnitude of risk heterogeneity in Toronto, Ontario from January to November 2020 using a retrospective, population-based observational study using surveillance data. METHODS: We generated epidemic curves by social determinants of health (SDOH) and crude Lorenz curves by neighbourhoods to visualize inequities in the distribution of COVID-19 and estimated Gini coefficients. We examined the correlation between SDOH using Pearson-correlation coefficients. RESULTS: Gini coefficient of cumulative cases by population size was 0.41 (95% confidence interval [CI]:0.36-0.47) and estimated for: household income (0.20, 95%CI: 0.14-0.28); visible minority (0.21, 95%CI:0.16-0.28); recent immigration (0.12, 95%CI:0.09-0.16); suitable housing (0.21, 95%CI:0.14-0.30); multigenerational households (0.19, 95%CI:0.15-0.23); and essential workers (0.28, 95%CI:0.23-0.34). CONCLUSIONS: There was rapid epidemiologic transition from higher- to lower-income neighborhoods with Lorenz curve transitioning from below to above the line of equality across SDOH. Moving forward necessitates integrating programs and policies addressing socioeconomic inequities and structural racism into COVID-19 prevention and vaccination programs.


Assuntos
COVID-19 , Geografia , Humanos , Ontário/epidemiologia , Estudos Retrospectivos , SARS-CoV-2 , Fatores Socioeconômicos , Racismo Sistêmico
10.
Genome Res ; 32(1): 111-123, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34785526

RESUMO

The Mediator complex is central to transcription by RNA polymerase II (Pol II) in eukaryotes. In budding yeast (Saccharomyces cerevisiae), Mediator is recruited by activators and associates with core promoter regions, where it facilitates preinitiation complex (PIC) assembly, only transiently before Pol II escape. Interruption of the transcription cycle by inactivation or depletion of Kin28 inhibits Pol II escape and stabilizes this association. However, Mediator occupancy and dynamics have not been examined on a genome-wide scale in yeast grown in nonstandard conditions. Here we investigate Mediator occupancy following heat shock or CdCl2 exposure, with and without depletion of Kin28. We find that Pol II occupancy shows similar dependence on Mediator under normal and heat shock conditions. However, although Mediator association increases at many genes upon Kin28 depletion under standard growth conditions, little or no increase is observed at most genes upon heat shock, indicating a more stable association of Mediator after heat shock. Unexpectedly, Mediator remains associated upstream of the core promoter at genes repressed by heat shock or CdCl2 exposure whether or not Kin28 is depleted, suggesting that Mediator is recruited by activators but is unable to engage PIC components at these repressed targets. This persistent association is strongest at promoters that bind the HMGB family member Hmo1, and is reduced but not eliminated in hmo1Δ yeast. Finally, we show a reduced dependence on PIC components for Mediator occupancy at promoters after heat shock, further supporting altered dynamics or stronger engagement with activators under these conditions.


Assuntos
Proteínas de Saccharomyces cerevisiae , Saccharomycetales , Regulação Fúngica da Expressão Gênica , Resposta ao Choque Térmico/genética , Complexo Mediador/genética , Complexo Mediador/metabolismo , RNA Polimerase II/genética , RNA Polimerase II/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomycetales/genética , Transcrição Gênica
11.
J Infect Dis ; 225(8): 1317-1320, 2022 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-34919700

RESUMO

We assessed the COVID-19 pandemic's impact on treatment of latent tuberculosis, and of active tuberculosis, at 3 centers in Montreal and Toronto, using data from 10 833 patients (8685 with latent tuberculosis infection, 2148 with active tuberculosis). Observation periods prior to declarations of COVID-19 public health emergencies ranged from 219 to 744 weeks, and after declarations, from 28 to 33 weeks. In the latter period, reductions in latent tuberculosis infection treatment initiation rates ranged from 30% to 66%. At 2 centers, active tuberculosis treatment rates fell by 16% and 29%. In Canada, cornerstone measures for tuberculosis elimination weakened during the COVID-19 pandemic.


Assuntos
COVID-19 , Tuberculose Latente , Tuberculose , Canadá/epidemiologia , Humanos , Pandemias/prevenção & controle , SARS-CoV-2 , Tuberculose/tratamento farmacológico , Tuberculose/epidemiologia , Tuberculose/prevenção & controle
12.
Nat Commun ; 12(1): 5280, 2021 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-34489435

RESUMO

Little is known about the roles of histone tails in modulating nucleosomal DNA accessibility and its recognition by other macromolecules. Here we generate extensive atomic level conformational ensembles of histone tails in the context of the full nucleosome, totaling 65 microseconds of molecular dynamics simulations. We observe rapid conformational transitions between tail bound and unbound states, and characterize kinetic and thermodynamic properties of histone tail-DNA interactions. Different histone types exhibit distinct binding modes to specific DNA regions. Using a comprehensive set of experimental nucleosome complexes, we find that the majority of them target mutually exclusive regions with histone tails on nucleosomal/linker DNA around the super-helical locations ± 1, ± 2, and ± 7, and histone tails H3 and H4 contribute most to this process. These findings are explained within competitive binding and tail displacement models. Finally, we demonstrate the crosstalk between different histone tail post-translational modifications and mutations; those which change charge, suppress tail-DNA interactions and enhance histone tail dynamics and DNA accessibility.


Assuntos
DNA/química , Histonas/química , Nucleossomos/ultraestrutura , Processamento de Proteína Pós-Traducional , Proteínas Proto-Oncogênicas p21(ras)/química , Animais , Sítios de Ligação , DNA/genética , DNA/metabolismo , Genoma Humano , Histonas/genética , Histonas/metabolismo , Humanos , Simulação de Dinâmica Molecular , Conformação de Ácido Nucleico , Nucleossomos/genética , Nucleossomos/metabolismo , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Proteínas Proto-Oncogênicas p21(ras)/genética , Proteínas Proto-Oncogênicas p21(ras)/metabolismo , Eletricidade Estática , Transcrição Gênica , Xenopus laevis
13.
Nucleic Acids Res ; 49(8): 4493-4505, 2021 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-33872375

RESUMO

An essential questions of gene regulation is how large number of enhancers and promoters organize into gene regulatory loops. Using transcription-factor binding enrichment as an indicator of enhancer strength, we identified a portion of H3K27ac peaks as potentially strong enhancers and found a universal pattern of promoter and enhancer distribution: At actively transcribed regions of length of ∼200-300 kb, the numbers of active promoters and enhancers are inversely related. Enhancer clusters are associated with isolated active promoters, regardless of the gene's cell-type specificity. As the number of nearby active promoters increases, the number of enhancers decreases. At regions where multiple active genes are closely located, there are few distant enhancers. With Hi-C analysis, we demonstrate that the interactions among the regulatory elements (active promoters and enhancers) occur predominantly in clusters and multiway among linearly close elements and the distance between adjacent elements shows a preference of ∼30 kb. We propose a simple rule of spatial organization of active promoters and enhancers: Gene transcriptions and regulations mainly occur at local active transcription hubs contributed dynamically by multiple elements from linearly close enhancers and/or active promoters. The hub model can be represented with a flower-shaped structure and implies an enhancer-like role of active promoters.


Assuntos
Cromossomos/metabolismo , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica/genética , Histonas/metabolismo , Regiões Promotoras Genéticas , Acetilação , Sequenciamento de Cromatina por Imunoprecipitação , Cromossomos/genética , Bases de Dados Genéticas , Genoma Humano , Humanos , Modelos Genéticos , Família Multigênica , Vírus da Hepatite Murina , RNA-Seq , Ativação Transcricional/genética
14.
PLoS One ; 16(3): e0247872, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33657184

RESUMO

BACKGROUND: Tuberculosis (TB) is a major cause of death worldwide. TB research draws heavily on clinical cohorts which can be generated using electronic health records (EHR), but granular information extracted from unstructured EHR data is limited. The St. Michael's Hospital TB database (SMH-TB) was established to address gaps in EHR-derived TB clinical cohorts and provide researchers and clinicians with detailed, granular data related to TB management and treatment. METHODS: We collected and validated multiple layers of EHR data from the TB outpatient clinic at St. Michael's Hospital, Toronto, Ontario, Canada to generate the SMH-TB database. SMH-TB contains structured data directly from the EHR, and variables generated using natural language processing (NLP) by extracting relevant information from free-text within clinic, radiology, and other notes. NLP performance was assessed using recall, precision and F1 score averaged across variable labels. We present characteristics of the cohort population using binomial proportions and 95% confidence intervals (CI), with and without adjusting for NLP misclassification errors. RESULTS: SMH-TB currently contains retrospective patient data spanning 2011 to 2018, for a total of 3298 patients (N = 3237 with at least 1 associated dictation). Performance of TB diagnosis and medication NLP rulesets surpasses 93% in recall, precision and F1 metrics, indicating good generalizability. We estimated 20% (95% CI: 18.4-21.2%) were diagnosed with active TB and 46% (95% CI: 43.8-47.2%) were diagnosed with latent TB. After adjusting for potential misclassification, the proportion of patients diagnosed with active and latent TB was 18% (95% CI: 16.8-19.7%) and 40% (95% CI: 37.8-41.6%) respectively. CONCLUSION: SMH-TB is a unique database that includes a breadth of structured data derived from structured and unstructured EHR data by using NLP rulesets. The data are available for a variety of research applications, such as clinical epidemiology, quality improvement and mathematical modeling studies.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Tuberculose/epidemiologia , Bases de Dados Factuais , Feminino , Hospitais , Humanos , Armazenamento e Recuperação da Informação , Masculino , Ontário/epidemiologia , Estudos Retrospectivos , Tuberculose/diagnóstico
15.
Gigascience ; 10(1)2021 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-33410471

RESUMO

BACKGROUND: FAIR (Findability, Accessibility, Interoperability, and Reusability) next-generation sequencing (NGS) data analysis relies on complex computational biology workflows and pipelines to guarantee reproducibility, portability, and scalability. Moreover, workflow languages, managers, and container technologies have helped address the problem of data analysis pipeline execution across multiple platforms in scalable ways. FINDINGS: Here, we present a project management framework for NGS data analysis called PM4NGS. This framework is composed of an automatic creation of a standard organizational structure of directories and files, bioinformatics tool management using Docker or Bioconda, and data analysis pipelines in CWL format. Pre-configured Jupyter notebooks with minimum Python code are included in PM4NGS to produce a project report and publication-ready figures. We present 3 pipelines for demonstration purposes including the analysis of RNA-Seq, ChIP-Seq, and ChIP-exo datasets. CONCLUSIONS: PM4NGS is an open source framework that creates a standard organizational structure for NGS data analysis projects. PM4NGS is easy to install, configure, and use by non-bioinformaticians on personal computers and laptops. It permits execution of the NGS data analysis on Windows 10 with the Windows Subsystem for Linux feature activated. The framework aims to reduce the gap between researcher in experimental laboratories producing NGS data and workflows for data analysis. PM4NGS documentation can be accessed at https://pm4ngs.readthedocs.io/.


Assuntos
Análise de Dados , Software , Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , Reprodutibilidade dos Testes
16.
Gigascience ; 10(2)2021 01 29.
Artigo em Inglês | MEDLINE | ID: mdl-33511996

RESUMO

BACKGROUND: The NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative provides NIH-funded researchers cost-effective access to commercial cloud providers, such as Amazon Web Services (AWS) and Google Cloud Platform (GCP). These cloud providers represent an alternative for the execution of large computational biology experiments like transcriptome annotation, which is a complex analytical process that requires the interrogation of multiple biological databases with several advanced computational tools. The core components of annotation pipelines published since 2012 are BLAST sequence alignments using annotated databases of both nucleotide or protein sequences almost exclusively with networked on-premises compute systems. FINDINGS: We compare multiple BLAST sequence alignments using AWS and GCP. We prepared several Jupyter Notebooks with all the code required to submit computing jobs to the batch system on each cloud provider. We consider the consequence of the number of query transcripts in input files and the effect on cost and processing time. We tested compute instances with 16, 32, and 64 vCPUs on each cloud provider. Four classes of timing results were collected: the total run time, the time for transferring the BLAST databases to the instance local solid-state disk drive, the time to execute the CWL script, and the time for the creation, set-up, and release of an instance. This study aims to establish an estimate of the cost and compute time needed for the execution of multiple BLAST runs in a cloud environment. CONCLUSIONS: We demonstrate that public cloud providers are a practical alternative for the execution of advanced computational biology experiments at low cost. Using our cloud recipes, the BLAST alignments required to annotate a transcriptome with ∼500,000 transcripts can be processed in <2 hours with a compute cost of ∼$200-$250. In our opinion, for BLAST-based workflows, the choice of cloud platform is not dependent on the workflow but, rather, on the specific details and requirements of the cloud provider. These choices include the accessibility for institutional use, the technical knowledge required for effective use of the platform services, and the availability of open source frameworks such as APIs to deploy the workflow.


Assuntos
Software , Transcriptoma , Computação em Nuvem , Biologia Computacional , Bases de Dados Factuais , Fluxo de Trabalho
17.
Curr Opin Struct Biol ; 67: 153-160, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33279866

RESUMO

Histone tails, representing the N-terminal or C-terminal regions flanking the histone core, play essential roles in chromatin signaling networks. Intrinsic disorder of histone tails and their propensity for post-translational modifications allow them to serve as hubs in coordination of epigenetic processes within the nucleosomal context. Deposition of histone variants with distinct histone tail properties further enriches histone tails' repertoire in epigenetic signaling. Given the advances in experimental techniques and in silico modelling, we review the most recent data on histone tails' effects on nucleosome stability and dynamics, their function in regulating chromatin accessibility and folding. Finally, we discuss different molecular mechanisms to understand how histone tails are involved in nucleosome recognition by binding partners and formation of higher-order chromatin structures.


Assuntos
Cromatina , Histonas , Cromatina/genética , Epigênese Genética , Histonas/metabolismo , Nucleossomos , Processamento de Proteína Pós-Traducional
18.
Nucleic Acids Res ; 49(D1): D274-D281, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33167031

RESUMO

The Clusters of Orthologous Genes (COG) database, also referred to as the Clusters of Orthologous Groups of proteins, was created in 1997 and went through several rounds of updates, most recently, in 2014. The current update, available at https://www.ncbi.nlm.nih.gov/research/COG, substantially expands the scope of the database to include complete genomes of 1187 bacteria and 122 archaea, typically, with a single genome per genus. In addition, the current version of the COGs includes the following new features: (i) the recently deprecated NCBI's gene index (gi) numbers for the encoded proteins are replaced with stable RefSeq or GenBank\ENA\DDBJ coding sequence (CDS) accession numbers; (ii) COG annotations are updated for >200 newly characterized protein families with corresponding references and PDB links, where available; (iii) lists of COGs grouped by pathways and functional systems are added; (iv) 266 new COGs for proteins involved in CRISPR-Cas immunity, sporulation in Firmicutes and photosynthesis in cyanobacteria are included; and (v) the database is made available as a web page, in addition to FTP. The current release includes 4877 COGs. Future plans include further expansion of the COG collection by adding archaeal COGs (arCOGs), splitting the COGs containing multiple paralogs, and continued refinement of COG annotations.


Assuntos
Archaea/genética , Bactérias/genética , Bases de Dados Genéticas , Genoma Arqueal , Genoma Bacteriano , Archaea/metabolismo , Proteínas Arqueais/classificação , Proteínas Arqueais/genética , Proteínas Arqueais/metabolismo , Bactérias/imunologia , Bactérias/metabolismo , Proteínas de Bactérias/classificação , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Sistemas CRISPR-Cas , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Esporos Bacterianos/genética , Esporos Bacterianos/crescimento & desenvolvimento
19.
J Mol Biol ; 433(6): 166684, 2021 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-33098859

RESUMO

To elucidate the properties of human histone interactions on the large scale, we perform a comprehensive mapping of human histone interaction networks by using data from structural, chemical cross-linking and various high-throughput studies. Histone interactomes derived from different data sources show limited overlap and complement each other. It inspires us to integrate these data into the combined histone global interaction network which includes 5308 proteins and 10,330 interactions. The analysis of topological properties of the human histone interactome reveals its scale free behavior and high modularity. Our study of histone binding interfaces uncovers a remarkably high number of residues involved in interactions between histones and non-histone proteins, 80-90% of residues in histones H3 and H4 have at least one binding partner. Two types of histone binding modes are detected: interfaces conserved in most histone variants and variant specific interfaces. Finally, different types of chromatin factors recognize histones in nucleosomes via distinct binding modes, and many of these interfaces utilize acidic patches among other sites. Interaction networks are available at https://github.com/Panchenko-Lab/Human-histone-interactome.


Assuntos
Proteínas Cromossômicas não Histona/química , DNA/química , Histonas/química , Nucleossomos/ultraestrutura , Mapas de Interação de Proteínas , Sítios de Ligação , Proteínas Cromossômicas não Histona/genética , Proteínas Cromossômicas não Histona/metabolismo , DNA/genética , DNA/metabolismo , Bases de Dados de Proteínas , Histonas/genética , Histonas/metabolismo , Humanos , Internet , Conformação de Ácido Nucleico , Nucleossomos/química , Nucleossomos/metabolismo , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Software
20.
Data Brief ; 33: 106555, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33299912

RESUMO

Here, we present the data of human histone interactomes generated and analysed in the research article by Peng et al., 2020 [1]. The histone interactome data provide a comprehensive mapping of human histone/nucleosome interaction networks by using different data sources from the structural, chemical cross-linking, and high-throughput studies. The histone interactions are presented at different levels of granularity in networks, including protein, domain, and residue-levels. All human histone interactome Cytoscape session files are available at https://github.com/Panchenko-Lab/Human-histone-interactome.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA