RESUMO
Fundamental to effective Legionnaires' disease outbreak control is the ability to rapidly identify the environmental source(s) of the causative agent, Legionella pneumophila. Genomics has revolutionized pathogen surveillance, but L. pneumophila has a complex ecology and population structure that can limit source inference based on standard core genome phylogenetics. Here, we present a powerful machine learning approach that assigns the geographical source of Legionnaires' disease outbreaks more accurately than current core genome comparisons. Models were developed upon 534 L. pneumophila genome sequences, including 149 genomes linked to 20 previously reported Legionnaires' disease outbreaks through detailed case investigations. Our classification models were developed in a cross-validation framework using only environmental L. pneumophila genomes. Assignments of clinical isolate geographic origins demonstrated high predictive sensitivity and specificity of the models, with no false positives or false negatives for 13 out of 20 outbreak groups, despite the presence of within-outbreak polyclonal population structure. Analysis of the same 534-genome panel with a conventional phylogenomic tree and a core genome multi-locus sequence type allelic distance-based classification approach revealed that our machine learning method had the highest overall classification performance-agreement with epidemiological information. Our multivariate statistical learning approach maximizes the use of genomic variation data and is thus well-suited for supporting Legionnaires' disease outbreak investigations.IMPORTANCEIdentifying the sources of Legionnaires' disease outbreaks is crucial for effective control. Current genomic methods, while useful, often fall short due to the complex ecology and population structure of Legionella pneumophila, the causative agent. Our study introduces a high-performing machine learning approach for more accurate geographical source attribution of Legionnaires' disease outbreaks. Developed using cross-validation on environmental L. pneumophila genomes, our models demonstrate excellent predictive sensitivity and specificity. Importantly, this new approach outperforms traditional methods like phylogenomic trees and core genome multi-locus sequence typing, proving more efficient at leveraging genomic variation data to infer outbreak sources. Our machine learning algorithms, harnessing both core and accessory genomic variation, offer significant promise in public health settings. By enabling rapid and precise source identification in Legionnaires' disease outbreaks, such approaches have the potential to expedite intervention efforts and curtail disease transmission.
Assuntos
Legionella pneumophila , Doença dos Legionários , Humanos , Legionella pneumophila/genética , Doença dos Legionários/epidemiologia , Tipagem de Sequências Multilocus/métodos , Genômica/métodos , Epidemiologia Molecular/métodos , Surtos de DoençasRESUMO
Endometriosis is a complex disease, influenced by genetic factors. Genetic markers associated with endometriosis exist at chromosome 1p36.12 and lead to altered expression of the long intergenic non-coding RNA 339 (LINC00339), however, the role of LINC00339 in endometriosis pathophysiology remains unknown. The aim of this work was to characterize the expression patterns of LINC00339 mRNA in endometrium and endometriotic lesions in situ and to determine the functional role of LINC00339 in human endometrium. We employed RNA-sequencing (RNA-seq), quantitative RT-PCR and in situ hybridization to investigate the abundance of LINC00339 transcripts in endometrium and endometrial cell lines and to describe the pattern and localization of LINC00339 expression in endometrium and endometriotic lesions. LINC00339 mRNA expression was manipulated (overexpressed and silenced) in endometrial stromal cell lines and RNA-seq data from overexpression models were analysed using online bioinformatics platforms (STRING and Ingenuity Pathway Analysis) to determine functional processes. We demonstrated the expression of LINC00339 in endometriotic lesions for the first time; we found LINC00339 expression was restricted to the lesion foci and absent in surrounding non-lesion tissue. Furthermore, manipulation of LINC00339 expression in endometrial stromal cell lines significantly impacted the expression of genes involved in immune defence pathways. These studies identify a novel mechanism for LINC00339 activity in endometrium and endometriosis, paving the way for future work, which is essential for understanding the pathogenesis of endometriosis.
Assuntos
Endometriose/metabolismo , Endométrio/metabolismo , RNA Longo não Codificante/metabolismo , Estudos de Casos e Controles , Linhagem Celular , Endometriose/genética , Endometriose/imunologia , Endométrio/imunologia , Feminino , Regulação da Expressão Gênica , Humanos , Hibridização In Situ , RNA Longo não Codificante/genética , RNA-Seq , Reação em Cadeia da Polimerase em Tempo Real , Transdução de SinaisRESUMO
RESEARCH QUESTION: Does obesity affect endometrial gene expression in women with endometriosis, specifically women with stage I disease? DESIGN: Differential gene expression analysis was conducted on endometrium from women with and without endometriosis (n = 169). Women were diagnosed after surgical visualization and staged according to the revised American Society for Reproductive Medicine (stage I-IV). Women were grouped by body mass index (BMI) (kg/m2) as underweight, normal, pre-obese or obese. After accounting for menstrual cycle stage, endometrial gene expression was analysed by BMI (continuous and grouped) in women with endometriosis, and in non-endometriosis controls. RESULTS: No significant interaction effect was found between BMI and endometriosis status on endometrial gene expression. We have previously reported that obese women with endometriosis have a reduced incidence of stage I disease; however, stratifying our analysis into stage I endometriosis versus combined II, III and IV endometriosis failed to reveal any differentially expressed endometrial genes between normal, pre-obese and obese patients. CONCLUSIONS: Despite obesity having deleterious effects on endometrial gene expression in other gynaecological pathologies, e.g. endometrial cancer and polycystic ovary syndrome, our results do not support an association between BMI and altered endometrial gene expression in women with or without endometriosis.
Assuntos
Endometriose/metabolismo , Endométrio/metabolismo , Regulação da Expressão Gênica , Expressão Gênica , Obesidade/metabolismo , Adolescente , Adulto , Índice de Massa Corporal , Endometriose/complicações , Endometriose/genética , Feminino , Humanos , Pessoa de Meia-Idade , Obesidade/complicações , Obesidade/genética , Adulto JovemRESUMO
BACKGROUND: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. RESULTS: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. CONCLUSIONS: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects.
Assuntos
Biologia Computacional , Comportamento Cooperativo , Software , Comunicação , InternetRESUMO
UNLABELLED: We present BioBlend, a unified API in a high-level language (python) that wraps the functionality of Galaxy and CloudMan APIs. BioBlend makes it easy for bioinformaticians to automate end-to-end large data analysis, from scratch, in a way that is highly accessible to collaborators, by allowing them to both provide the required infrastructure and automate complex analyses over large datasets within the familiar Galaxy environment. AVAILABILITY AND IMPLEMENTATION: http://bioblend.readthedocs.org/. Automated installation of BioBlend is available via PyPI (e.g. pip install bioblend). Alternatively, the source code is available from the GitHub repository (https://github.com/afgane/bioblend) under the MIT open source license. The library has been tested and is working on Linux, Macintosh and Windows-based systems.
Assuntos
Genômica/métodos , SoftwareRESUMO
Delivering large-scale routine pathogen genomics surveillance for public health is of considerable interest, although translational research models that promote national-level implementation are not well defined. We describe the development and deployment of the Australian Pathogen Genomics Program (AusPathoGen), a comprehensive national partnership between academia, public health laboratories, and public health agencies that commenced in January, 2021. Successfully establishing and delivering a national programme requires inclusive and transparent collaboration between stakeholders, defined and clear focus on public health priorities, and support for strengthening national genomics capacity. Major enablers for delivering such a programme include technical solutions for data integration and analysis, such as the genomics surveillance platform AusTrakka, standard bioinformatic analysis methods, and national ethics and data sharing agreements that promote nationally integrated surveillance systems. Training of public health officials to interpret and act on genomic data is crucial, and evaluation and cost-effectiveness programmes will provide a benchmark and evidence for sustainable investment in genomics nationally and globally.
RESUMO
Synchrotron microbeam radiation therapy (MRT) is a preclinical irradiation technique which could be used to treat intracranial malignancies. The goal of this work was to discern differences in gene expression and the predicted regulation of molecular pathways in the brainstem after MRT versus synchrotron broad-beam radiation therapy (SBBR). Healthy C57BL/6 mice received whole-head irradiation with median acute toxic doses of MRT (241 Gy peak dose) or SBBR (13 Gy). Brains were harvested 4 and 48 h postirradiation and RNA was extracted from the brainstem. RNA-sequencing was performed to identify differentially expressed genes (false discovery rate < 0.01) relative to nonirradiated controls and significantly regulated molecular pathways and biological functions were identified (Benjamini-Hochberg corrected P < 0.05). Differentially expressed genes and regulated pathways largely reflected a pro-inflammatory response 4 h after both MRT and SBBR which was sustained at 48 h postirradiation for MRT. Pathways relating to radiation-induced viral mimicry, including HMGB1, NF-κB and interferon signaling cascades, were predicted to be uniquely activated by MRT. Local microglia, as well as circulating leukocytes, including T cells, were predicted to be activated by MRT. Our findings affirm that the transcriptomic signature of MRT is distinct from broad-beam radiotherapy, with a sustained inflammatory and immune response up to 48 h postirradiation.
Assuntos
Neoplasias Encefálicas , Animais , Tronco Encefálico , Proliferação de Células , Camundongos , Radiografia , Raios XRESUMO
BACKGROUND: Bioinformatics software tools are often created ad hoc, frequently by people without extensive training in software development. In particular, for beginners, the barrier to entry in bioinformatics software development is high, especially if they want to adopt good programming practices. Even experienced developers do not always follow best practices. This results in the proliferation of poorer-quality bioinformatics software, leading to limited scalability and inefficient use of resources; lack of reproducibility, usability, adaptability, and interoperability; and erroneous or inaccurate results. FINDINGS: We have developed Bionitio, a tool that automates the process of starting new bioinformatics software projects following recommended best practices. With a single command, the user can create a new well-structured project in 1 of 12 programming languages. The resulting software is functional, carrying out a prototypical bioinformatics task, and thus serves as both a working example and a template for building new tools. Key features include command-line argument parsing, error handling, progress logging, defined exit status values, a test suite, a version number, standardized building and packaging, user documentation, code documentation, a standard open source software license, software revision control, and containerization. CONCLUSIONS: Bionitio serves as a learning aid for beginner-to-intermediate bioinformatics programmers and provides an excellent starting point for new projects. This helps developers adopt good programming practices from the beginning of a project and encourages high-quality tools to be developed more rapidly. This also benefits users because tools are more easily installed and consistent in their usage. Bionitio is released as open source software under the MIT License and is available at https://github.com/bionitio-team/bionitio.
Assuntos
Biologia Computacional , SoftwareRESUMO
BACKGROUND: Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. RESULTS: We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. CONCLUSIONS: This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.
Assuntos
Computação em Nuvem , Biologia Computacional/métodos , Genômica/métodos , Interface Usuário-Computador , Animais , Bases de Dados Genéticas , Humanos , SoftwareRESUMO
Tumour heterogeneity in primary prostate cancer is a well-established phenomenon. However, how the subclonal diversity of tumours changes during metastasis and progression to lethality is poorly understood. Here we reveal the precise direction of metastatic spread across four lethal prostate cancer patients using whole-genome and ultra-deep targeted sequencing of longitudinally collected primary and metastatic tumours. We find one case of metastatic spread to the surgical bed causing local recurrence, and another case of cross-metastatic site seeding combining with dynamic remoulding of subclonal mixtures in response to therapy. By ultra-deep sequencing end-stage blood, we detect both metastatic and primary tumour clones, even years after removal of the prostate. Analysis of mutations associated with metastasis reveals an enrichment of TP53 mutations, and additional sequencing of metastases from 19 patients demonstrates that acquisition of TP53 mutations is linked with the expansion of subclones with metastatic potential which we can detect in the blood.
Assuntos
Adenocarcinoma/genética , Neoplasias Ósseas/genética , Neoplasias Encefálicas/genética , Neoplasias da Próstata/genética , Proteína Supressora de Tumor p53/genética , Adenocarcinoma/secundário , Idoso , Neoplasias Ósseas/secundário , Neoplasias Encefálicas/secundário , Variações do Número de Cópias de DNA , Progressão da Doença , Humanos , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Mutação , Metástase Neoplásica , Polimorfismo de Nucleotídeo Único , Neoplasias da Próstata/patologia , RNA Mensageiro , Análise de Sequência de DNARESUMO
Human colorectal cancer cell lines are used widely to investigate tumor biology, experimental therapy, and biomarkers. However, to what extent these established cell lines represent and maintain the genetic diversity of primary cancers is uncertain. In this study, we profiled 70 colorectal cancer cell lines for mutations and DNA copy number by whole-exome sequencing and SNP microarray analyses, respectively. Gene expression was defined using RNA-Seq. Cell line data were compared with those published for primary colorectal cancers in The Cancer Genome Atlas. Notably, we found that exome mutation and DNA copy-number spectra in colorectal cancer cell lines closely resembled those seen in primary colorectal tumors. Similarities included the presence of two hypermutation phenotypes, as defined by signatures for defective DNA mismatch repair and DNA polymerase ε proofreading deficiency, along with concordant mutation profiles in the broadly altered WNT, MAPK, PI3K, TGFß, and p53 pathways. Furthermore, we documented mutations enriched in genes involved in chromatin remodeling (ARID1A, CHD6, and SRCAP) and histone methylation or acetylation (ASH1L, EP300, EP400, MLL2, MLL3, PRDM2, and TRRAP). Chromosomal instability was prevalent in nonhypermutated cases, with similar patterns of chromosomal gains and losses. Although paired cell lines derived from the same tumor exhibited considerable mutation and DNA copy-number differences, in silico simulations suggest that these differences mainly reflected a preexisting heterogeneity in the tumor cells. In conclusion, our results establish that human colorectal cancer lines are representative of the main subtypes of primary tumors at the genomic level, further validating their utility as tools to investigate colorectal cancer biology and drug responses.