Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37610338

RESUMO

MOTIVATION: Gene set enrichment methods are a common tool to improve the interpretability of gene lists as obtained, for example, from differential gene expression analyses. They are based on computing whether dysregulated genes are located in certain biological pathways more often than expected by chance. Gene set enrichment tools rely on pre-existing pathway databases such as KEGG, Reactome, or the Gene Ontology. These databases are increasing in size and in the number of redundancies between pathways, which complicates the statistical enrichment computation. RESULTS: We address this problem and develop a novel gene set enrichment method, called pareg, which is based on a regularized generalized linear model and directly incorporates dependencies between gene sets related to certain biological functions, for example, due to shared genes, in the enrichment computation. We show that pareg is more robust to noise than competing methods. Additionally, we demonstrate the ability of our method to recover known pathways as well as to suggest novel treatment targets in an exploratory analysis using breast cancer samples from TCGA. AVAILABILITY AND IMPLEMENTATION: pareg is freely available as an R package on Bioconductor (https://bioconductor.org/packages/release/bioc/html/pareg.html) as well as on https://github.com/cbg-ethz/pareg. The GitHub repository also contains the Snakemake workflows needed to reproduce all results presented here.


Assuntos
Bases de Dados Factuais , Ontologia Genética , Modelos Lineares , Fluxo de Trabalho
2.
Sci Transl Med ; 15(680): eabn7979, 2023 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-36346321

RESUMO

Genome sequences from evolving infectious pathogens allow quantification of case introductions and local transmission dynamics. We sequenced 11,357 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes from Switzerland in 2020-the sixth largest effort globally. Using a representative subset of these data, we estimated viral introductions to Switzerland and their persistence over the course of 2020. We contrasted these estimates with simple null models representing the absence of certain public health measures. We show that Switzerland's border closures decoupled case introductions from incidence in neighboring countries. Under a simple model, we estimate an 86 to 98% reduction in introductions during Switzerland's strictest border closures. Furthermore, the Swiss 2020 partial lockdown roughly halved the time for sampled introductions to die out. Last, we quantified local transmission dynamics once introductions into Switzerland occurred using a phylodynamic model. We found that transmission slowed 35 to 63% upon outbreak detection in summer 2020 but not in fall. This finding may indicate successful contact tracing over summer before overburdening in fall. The study highlights the added value of genome sequencing data for understanding transmission dynamics.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/genética , Saúde Pública , Suíça/epidemiologia , Controle de Doenças Transmissíveis , Genoma Viral/genética , Filogenia
3.
Nat Microbiol ; 7(8): 1151-1160, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35851854

RESUMO

The continuing emergence of SARS-CoV-2 variants of concern and variants of interest emphasizes the need for early detection and epidemiological surveillance of novel variants. We used genomic sequencing of 122 wastewater samples from three locations in Switzerland to monitor the local spread of B.1.1.7 (Alpha), B.1.351 (Beta) and P.1 (Gamma) variants of SARS-CoV-2 at a population level. We devised a bioinformatics method named COJAC (Co-Occurrence adJusted Analysis and Calling) that uses read pairs carrying multiple variant-specific signature mutations as a robust indicator of low-frequency variants. Application of COJAC revealed that a local outbreak of the Alpha variant in two Swiss cities was observable in wastewater up to 13 d before being first reported in clinical samples. We further confirmed the ability of COJAC to detect emerging variants early for the Delta variant by analysing an additional 1,339 wastewater samples. While sequencing data of single wastewater samples provide limited precision for the quantification of relative prevalence of a variant, we show that replicate and close-meshed longitudinal sequencing allow for robust estimation not only of the local prevalence but also of the transmission fitness advantage of any variant. We conclude that genomic sequencing and our computational analysis can provide population-level estimates of prevalence and fitness of emerging variants from wastewater samples earlier and on the basis of substantially fewer samples than from clinical samples. Our framework is being routinely used in large national projects in Switzerland and the UK.


Assuntos
COVID-19 , SARS-CoV-2 , COVID-19/diagnóstico , COVID-19/epidemiologia , Genômica , Humanos , SARS-CoV-2/genética , Águas Residuárias
4.
Sci Rep ; 12(1): 1952, 2022 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-35121764

RESUMO

A hallmark of ribosomal RNA (rRNA) are 2'-O-methyl groups that are introduced sequence specifically by box C/D small nucleolar RNAs (snoRNAs) in ribonucleoprotein particles. Most data on this chemical modification and its impact on RNA folding and stability are derived from organisms of the Opisthokonta supergroup. Using bioinformatics and RNA-seq data, we identify 30 novel box C/D snoRNAs in Dictyostelium discoideum, many of which are differentially expressed during the multicellular development of the amoeba. By applying RiboMeth-seq, we find 49 positions in the 17S and 26S rRNA 2'-O-methylated. Several of these nucleotides are substoichiometrically modified, with one displaying dynamic modification levels during development. Using homology-based models for the D. discoideum rRNA secondary structures, we localize many modified nucleotides in the vicinity of the ribosomal A, P and E sites. For most modified positions, a guiding box C/D snoRNA could be identified, allowing to determine idiosyncratic features of the snoRNA/rRNA interactions in the amoeba. Our data from D. discoideum represents the first evidence for ribosome heterogeneity in the Amoebozoa supergroup, allowing to suggest that it is a common feature of all eukaryotes.


Assuntos
Dictyostelium/metabolismo , Processamento Pós-Transcricional do RNA , RNA Ribossômico/metabolismo , Ribossomos/metabolismo , Biologia Computacional , Dictyostelium/genética , Metilação , Conformação de Ácido Nucleico , Estabilidade de RNA , RNA Ribossômico/genética , RNA Nucleolar Pequeno/genética , RNA Nucleolar Pequeno/metabolismo , RNA-Seq , Ribossomos/genética , Relação Estrutura-Atividade
5.
Hum Genomics ; 16(1): 2, 2022 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-35016721

RESUMO

BACKGROUND: Genome-wide association studies have identified statistical associations between various diseases, including cancers, and a large number of single-nucleotide polymorphisms (SNPs). However, they provide no direct explanation of the mechanisms underlying the association. Based on the recent discovery that changes in three-dimensional genome organization may have functional consequences on gene regulation favoring diseases, we investigated systematically the genome-wide distribution of disease-associated SNPs with respect to a specific feature of 3D genome organization: topologically associating domains (TADs) and their borders. RESULTS: For each of 449 diseases, we tested whether the associated SNPs are present in TAD borders more often than observed by chance, where chance (i.e., the null model in statistical terms) corresponds to the same number of pointwise loci drawn at random either in the entire genome, or in the entire set of disease-associated SNPs listed in the GWAS catalog. Our analysis shows that a fraction of diseases displays such a preferential localization of their risk loci. Moreover, cancers are relatively more frequent among these diseases, and this predominance is generally enhanced when considering only intergenic SNPs. The structure of SNP-based diseasome networks confirms that localization of risk loci in TAD borders differs between cancers and non-cancer diseases. Furthermore, different TAD border enrichments are observed in embryonic stem cells and differentiated cells, consistent with changes in topological domains along embryogenesis and delineating their contribution to disease risk. CONCLUSIONS: Our results suggest that, for certain diseases, part of the genetic risk lies in a local genetic variation affecting the genome partitioning in topologically insulated domains. Investigating this possible contribution to genetic risk is particularly relevant in cancers. This study thus opens a way of interpreting genome-wide association studies, by distinguishing two types of disease-associated SNPs: one with an effect on an individual gene, the other acting in interplay with 3D genome organization.


Assuntos
Estudo de Associação Genômica Ampla , Neoplasias , Regulação da Expressão Gênica , Genoma , Humanos , Neoplasias/genética , Polimorfismo de Nucleotídeo Único/genética
6.
Bioinformatics ; 38(6): 1550-1559, 2022 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-34927666

RESUMO

MOTIVATION: Signaling pathways control cellular behavior. Dysregulated pathways, for example, due to mutations that cause genes and proteins to be expressed abnormally, can lead to diseases, such as cancer. RESULTS: We introduce a novel computational approach, called Differential Causal Effects (dce), which compares normal to cancerous cells using the statistical framework of causality. The method allows to detect individual edges in a signaling pathway that are dysregulated in cancer cells, while accounting for confounding. Hence, technical artifacts have less influence on the results and dce is more likely to detect the true biological signals. We extend the approach to handle unobserved dense confounding, where each latent variable, such as, for example, batch effects or cell cycle states, affects many covariates. We show that dce outperforms competing methods on synthetic datasets and on CRISPR knockout screens. We validate its latent confounding adjustment properties on a GTEx (Genotype-Tissue Expression) dataset. Finally, in an exploratory analysis on breast cancer data from TCGA (The Cancer Genome Atlas), we recover known and discover new genes involved in breast cancer progression. AVAILABILITY AND IMPLEMENTATION: The method dce is freely available as an R package on Bioconductor (https://bioconductor.org/packages/release/bioc/html/dce.html) as well as on https://github.com/cbg-ethz/dce. The GitHub repository also contains the Snakemake workflows needed to reproduce all results presented here. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias da Mama , Software , Humanos , Feminino , Genoma , Transdução de Sinais
7.
Epidemics ; 37: 100480, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34488035

RESUMO

BACKGROUND: In December 2020, the United Kingdom (UK) reported a SARS-CoV-2 Variant of Concern (VoC) which is now named B.1.1.7. Based on initial data from the UK and later data from other countries, this variant was estimated to have a transmission fitness advantage of around 40-80 % (Volz et al., 2021; Leung et al., 2021; Davies et al., 2021). AIM: This study aims to estimate the transmission fitness advantage and the effective reproductive number of B.1.1.7 through time based on data from Switzerland. METHODS: We generated whole genome sequences from 11.8 % of all confirmed SARS-CoV-2 cases in Switzerland between 14 December 2020 and 11 March 2021. Based on these data, we determine the daily frequency of the B.1.1.7 variant and quantify the variant's transmission fitness advantage on a national and a regional scale. RESULTS: We estimate B.1.1.7 had a transmission fitness advantage of 43-52 % compared to the other variants circulating in Switzerland during the study period. Further, we estimate B.1.1.7 had a reproductive number above 1 from 01 January 2021 until the end of the study period, compared to below 1 for the other variants. Specifically, we estimate the reproductive number for B.1.1.7 was 1.24 [1.07-1.41] from 01 January until 17 January 2021 and 1.18 [1.06-1.30] from 18 January until 01 March 2021 based on the whole genome sequencing data. From 10 March to 16 March 2021, once B.1.1.7 was dominant, we estimate the reproductive number was 1.14 [1.00-1.26] based on all confirmed cases. For reference, Switzerland applied more non-pharmaceutical interventions to combat SARS-CoV-2 on 18 January 2021 and lifted some measures again on 01 March 2021. CONCLUSION: The observed increase in B.1.1.7 frequency in Switzerland during the study period is as expected based on observations in the UK. In absolute numbers, B.1.1.7 increased exponentially with an estimated doubling time of around 2-3.5 weeks. To monitor the ongoing spread of B.1.1.7, our plots are available online.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Suíça/epidemiologia , Reino Unido
8.
Curr Opin Virol ; 49: 157-163, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34153841

RESUMO

The genetic diversity of virus populations within their hosts is known to influence disease progression, treatment outcome, drug resistance, cell tropism, and transmission risk, and the study of dynamic changes of genetic heterogeneity can provide insights into the evolution of viruses. Several measures to quantify within-host genetic diversity capturing different aspects of diversity patterns in a sample or population are used, based on incidence, relative frequencies, pairwise distances, or phylogenetic trees. Here, we review and compare several of these measures.


Assuntos
Variação Genética , Viroses/virologia , Vírus/genética , Genoma Viral , Haplótipos , Humanos , Mutação , Filogenia , Quase-Espécies
9.
F1000Res ; 10: 33, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34035898

RESUMO

Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.


Assuntos
Análise de Dados , Software , Reprodutibilidade dos Testes , Fluxo de Trabalho
10.
Bioinformatics ; 37(12): 1673-1680, 2021 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-33471068

RESUMO

MOTIVATION: High-throughput sequencing technologies are used increasingly not only in viral genomics research but also in clinical surveillance and diagnostics. These technologies facilitate the assessment of the genetic diversity in intra-host virus populations, which affects transmission, virulence and pathogenesis of viral infections. However, there are two major challenges in analysing viral diversity. First, amplification and sequencing errors confound the identification of true biological variants, and second, the large data volumes represent computational limitations. RESULTS: To support viral high-throughput sequencing studies, we developed V-pipe, a bioinformatics pipeline combining various state-of-the-art statistical models and computational tools for automated end-to-end analyses of raw sequencing reads. V-pipe supports quality control, read mapping and alignment, low-frequency mutation calling, and inference of viral haplotypes. For generating high-quality read alignments, we developed a novel method, called ngshmmalign, based on profile hidden Markov models and tailored to small and highly diverse viral genomes. V-pipe also includes benchmarking functionality providing a standardized environment for comparative evaluations of different pipeline configurations. We demonstrate this capability by assessing the impact of three different read aligners (Bowtie 2, BWA MEM, ngshmmalign) and two different variant callers (LoFreq, ShoRAH) on the performance of calling single-nucleotide variants in intra-host virus populations. V-pipe supports various pipeline configurations and is implemented in a modular fashion to facilitate adaptations to the continuously changing technology landscape. AVAILABILITYAND IMPLEMENTATION: V-pipe is freely available at https://github.com/cbg-ethz/V-pipe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
Nucleic Acids Res ; 48(14): 7899-7913, 2020 08 20.
Artigo em Inglês | MEDLINE | ID: mdl-32609816

RESUMO

In the Elongator-dependent modification pathway, chemical modifications are introduced at the wobble uridines at position 34 in transfer RNAs (tRNAs), which serve to optimize codon translation rates. Here, we show that this three-step modification pathway exists in Dictyostelium discoideum, model of the evolutionary superfamily Amoebozoa. Not only are previously established modifications observable by mass spectrometry in strains with the most conserved genes of each step deleted, but also additional modifications are detected, indicating a certain plasticity of the pathway in the amoeba. Unlike described for yeast, D. discoideum allows for an unconditional deletion of the single tQCUG gene, as long as the Elongator-dependent modification pathway is intact. In gene deletion strains of the modification pathway, protein amounts are significantly reduced as shown by flow cytometry and Western blotting, using strains expressing different glutamine leader constructs fused to GFP. Most dramatic are these effects, when the tQCUG gene is deleted, or Elp3, the catalytic component of the Elongator complex is missing. In addition, Elp3 is the most strongly conserved protein of the modification pathway, as our phylogenetic analysis reveals. The implications of this observation are discussed with respect to the evolutionary age of the components acting in the Elongator-dependent modification pathway.


Assuntos
Dictyostelium/genética , RNA de Transferência/metabolismo , Anticódon/química , Anticódon/metabolismo , Códon , Dictyostelium/metabolismo , Deleção de Genes , Glutamina , Histona Acetiltransferases/genética , Histona Acetiltransferases/metabolismo , Mutação , Nucleosídeos/química , Filogenia , Biossíntese de Proteínas , Proteínas de Protozoários/classificação , Proteínas de Protozoários/genética , Proteínas de Protozoários/metabolismo , Uridina/metabolismo
12.
NPJ Syst Biol Appl ; 6(1): 5, 2020 02 17.
Artigo em Inglês | MEDLINE | ID: mdl-32066730

RESUMO

For a long time it has been hypothesized that bacterial gene regulation involves an intricate interplay of the transcriptional regulatory network (TRN) and the spatial organization of genes in the chromosome. Here we explore this hypothesis both on a structural and on a functional level. On the structural level, we study the TRN as a spatially embedded network. On the functional level, we analyze gene expression patterns from a network perspective ("digital control"), as well as from the perspective of the spatial organization of the chromosome ("analog control"). Our structural analysis reveals the outstanding relevance of the symmetry axis defined by the origin (Ori) and terminus (Ter) of replication for the network embedding and, thus, suggests the co-evolution of two regulatory infrastructures, namely the transcriptional regulatory network and the spatial arrangement of genes on the chromosome, to optimize the cross-talk between two fundamental biological processes: genomic expression and replication. This observation is confirmed by the functional analysis based on the differential gene expression patterns of more than 4000 pairs of microarray and RNA-Seq datasets for E. coli from the Colombos Database using complex network and machine learning methods. This large-scale analysis supports the notion that two logically distinct types of genetic control are cooperating to regulate gene expression in a complementary manner. Moreover, we find that the position of the gene relative to the Ori is a feature of very high predictive value for gene expression, indicating that the Ori-Ter symmetry axis coordinates the action of distinct genetic control mechanisms.


Assuntos
Regulação Bacteriana da Expressão Gênica/genética , Elementos Reguladores de Transcrição/genética , Origem de Replicação/genética , Bactérias/genética , Cromossomos Bacterianos/metabolismo , DNA Bacteriano/genética , Escherichia coli/genética , Proteínas de Escherichia coli/genética , Redes Reguladoras de Genes/genética , Origem de Replicação/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA