RESUMO
BACKGROUND: Fungi play a key role in several important ecological functions, ranging from organic matter decomposition to symbiotic associations with plants. Moreover, fungi naturally inhabit the human body and can be beneficial when administered as probiotics. In mycology, the internal transcribed spacer (ITS) region was adopted as the universal marker for classifying fungi. Hence, an accurate and robust method for ITS classification is not only desired for the purpose of better diversity estimation, but it can also help us gain a deeper insight into the dynamics of environmental communities and ultimately comprehend whether the abundance of certain species correlate with health and disease. Although many methods have been proposed for taxonomic classification, to the best of our knowledge, none of them fully explore the taxonomic tree hierarchy when building their models. This in turn, leads to lower generalization power and higher risk of committing classification errors. RESULTS: Here we introduce HiTaC, a robust hierarchical machine learning model for accurate ITS classification, which requires a small amount of data for training and can handle imbalanced datasets. HiTaC was thoroughly evaluated with the established TAXXI benchmark and could correctly classify fungal ITS sequences of varying lengths and a range of identity differences between the training and test data. HiTaC outperforms state-of-the-art methods when trained over noisy data, consistently achieving higher F1-score and sensitivity across different taxonomic ranks, improving sensitivity by 6.9 percentage points over top methods in the most noisy dataset available on TAXXI. CONCLUSIONS: HiTaC is publicly available at the Python package index, BIOCONDA and Docker Hub. It is released under the new BSD license, allowing free use in academia and industry. Source code and documentation, which includes installation and usage instructions, are available at https://gitlab.com/dacs-hpi/hitac .
Assuntos
Fungos , Aprendizado de Máquina , Fungos/genética , Fungos/classificação , DNA Espaçador Ribossômico/genética , SoftwareRESUMO
BACKGROUND: Tandem repeats are specific sequences in genomic DNA repeated in tandem that are present in all organisms. Among the subcategories of TRs we have Satellite repeats, that is divided into macrosatellites, minisatellites, and microsatellites, being the last two of specific interest because they can identify polymorphisms between organisms due to their instability. Currently, most mining tools focus on Simple Sequence Repeats (SSR) mining, and only a few can identify SSRs in the coding regions. RESULTS: We developed a microsatellite mining software called SATIN (Micro and Mini SATellite IdentificatioN tool) based on a new sliding window algorithm written in C and Python. It represents a new approach to SSR mining by addressing the limitations of existing tools, particularly in coding region SSR mining. SATIN is available at https://github.com/labgm/SATIN.git . It was shown to be the second fastest for perfect and compound SSR mining. It can identify SSRs from coding regions plus SSRs with motif sizes bigger than 6. Besides the SSR mining, SATIN can also analyze SSRs polymorphism on coding-regions from pre-determined groups, and identify SSRs differentially abundant among them on a per-gene basis. To validate, we analyzed SSRs from two groups of Escherichia coli (K12 and O157) and compared the results with 5 known SSRs from coding regions. SATIN identified all 5 SSRs from 237 genes with at least one SSR on it. CONCLUSIONS: The SATIN is a novel microsatellite search software that utilizes an innovative sliding window technique based on a numerical list for repeat region search to identify perfect, and composite SSRs while generating comprehensible and analyzable outputs. It is a tool capable of using files in fasta or GenBank format as input for microsatellite mining, also being able to identify SSRs present in coding regions for GenBank files. In conclusion, we expect SATIN to help identify potential SSRs to be used as genetic markers.
Assuntos
Mineração de Dados , Repetições de Microssatélites , Polimorfismo Genético , Software , Repetições de Microssatélites/genética , Mineração de Dados/métodos , Algoritmos , Fases de Leitura Aberta/genética , DNA Satélite/genéticaRESUMO
Arapaima gigas, an emblematic species of the Amazon region and a longstanding primary fishing resource, currently holds a "Data Deficient" status on the International Union for Conservation of Nature Red List, and is listed as an endangered species in Brazil. The Tocantins River is the most extensively modified large tributary of the Amazon Basin, and thus can affect the dynamics of ichthyofaunal populations. Over a period of 1 year, representatives of the fishing communities and fishermen from 25 fishing communities from four municipalities in the lower Tocantins River region were interviewed, and the obtained information was evaluated based on the literature to survey the population abundance status of A. gigas in the region and its impact on local communities. Among the fishermen interviewed, only one reported still encountering and fishing A. gigas on Jaracuera Island. The disappearance of A. gigas in the region are viewed as having economically disastrous consequences for the residents. Additionally, other endemic fish species are no longer observed in this locality either. If fishery management officials do not work together with local communities, A. gigas could disappear from the northern region of Brazil, where information on the dynamics of A. gigas fishing is lacking.
Assuntos
Conservação dos Recursos Naturais , Pesqueiros , Rios , Brasil , Animais , Pesqueiros/estatística & dados numéricos , Peixes/classificação , Dinâmica Populacional , Densidade Demográfica , Espécies em Perigo de Extinção , HumanosRESUMO
Biofilms are complex microecosystems with valuable ecological roles that can shelter a variety of microorganisms. Spirochetes from the genus Leptospira have been observed to form biofilms in vitro, in rural environments, and in the kidneys of reservoir rats. The genus Leptospira is composed of pathogenic and non-pathogenic species, and the description of new species is ongoing due to the advent of whole genome sequencing. Leptospires have increasingly been isolated from water and soil samples. To investigate the presence of Leptospira in environmental biofilms, we collected three distinct samples of biofilms formed in an urban setting with poor sanitation: Pau da Lima, in Salvador, Bahia, Brazil. All biofilm samples were negative for the presence of pathogenic leptospires via conventional PCR, but cultures containing saprophytic Leptospira were identified. Whole genomes were generated and analyzed for twenty isolates obtained from these biofilms. For species identification, we used digital DNA-DNA hybridization (dDDH) and average nucleotide identity (ANI) analysis. The obtained isolates were classified into seven presumptive species from the saprophytic S1 clade. ANI and dDDH analysis suggest that three of those seven species were new. Classical phenotypic tests confirmed the novel isolated bacteria as saprophytic Leptospira. The isolates presented typical morphology and ultrastructure according to scanning electron microscopy and formed biofilms under in vitro conditions. Our data indicate that a diversity of saprophytic Leptospira species survive in the Brazilian poorly sanitized urban environment, in a biofilm lifestyle. We believe our results contribute to a better understanding of Leptospira biology and ecology, considering biofilms as natural environmental reservoirs for leptospires.
Assuntos
Leptospira , Leptospirose , Animais , Ratos , Leptospira/genética , Leptospirose/microbiologia , Brasil , Biofilmes , DNARESUMO
Corynebacterium striatum, a common constituent of the human skin microbiome, is now considered an emerging multidrug-resistant pathogen of immunocompromised and chronically ill patients. However, little is known about the molecular mechanisms in the transition from colonization to the multidrug-resistant (MDR) invasive phenotype in clinical isolates. This study performed a comprehensive pan-genomic analysis of C. striatum, including isolates from "normal skin microbiome" and from MDR infections, to gain insights into genetic factors contributing to pathogenicity and multidrug resistance in this species. For this, three novel genome sequences were obtained from clinical isolates of C. striatum of patients from Brazil, and other 24 complete or draft C. striatum genomes were retrieved from GenBank, including the ATCC6940 isolate from the Human Microbiome Project. Analysis of C. striatum strains demonstrated the presence of an open pan-genome (α = 0.852803) containing 3816 gene families, including 15 antimicrobial resistance (AMR) genes and 32 putative virulence factors. The core and accessory genomes included 1297 and 1307 genes, respectively. The identified AMR genes are primarily associated with resistance to aminoglycosides and tetracyclines. Of these, 66.6% are present in genomic islands, and four AMR genes, including aac(6')-ib7, are located in a class 1-integron. In conclusion, our data indicated that C. striatum possesses genomic characteristics favorable to the invasive phenotype, with high genomic plasticity, a robust genetic arsenal for iron acquisition, and important virulence determinants and AMR genes present in mobile genetic elements.
Assuntos
Antibacterianos , Corynebacterium , Humanos , Fenótipo , Fatores de Virulência/genética , Farmacorresistência Bacteriana Múltipla/genética , Testes de Sensibilidade MicrobianaRESUMO
Computational biology has gained traction as an independent scientific discipline over the last years in South America. However, there is still a growing need for bioscientists, from different backgrounds, with different levels, to acquire programming skills, which could reduce the time from data to insights and bridge communication between life scientists and computer scientists. Python is a programming language extensively used in bioinformatics and data science, which is particularly suitable for beginners. Here, we describe the conception, organization, and implementation of the Brazilian Python Workshop for Biological Data. This workshop has been organized by graduate and undergraduate students and supported, mostly in administrative matters, by experienced faculty members since 2017. The workshop was conceived for teaching bioscientists, mainly students in Brazil, on how to program in a biological context. The goal of this article was to share our experience with the 2020 edition of the workshop in its virtual format due to the Coronavirus Disease 2019 (COVID-19) pandemic and to compare and contrast this year's experience with the previous in-person editions. We described a hands-on and live coding workshop model for teaching introductory Python programming. We also highlighted the adaptations made from in-person to online format in 2020, the participants' assessment of learning progression, and general workshop management. Lastly, we provided a summary and reflections from our personal experiences from the workshops of the last 4 years. Our takeaways included the benefits of the learning from learners' feedback (LLF) that allowed us to improve the workshop in real time, in the short, and likely in the long term. We concluded that the Brazilian Python Workshop for Biological Data is a highly effective workshop model for teaching a programming language that allows bioscientists to go beyond an initial exploration of programming skills for data analysis in the medium to long term.
Assuntos
Biologia Computacional/educação , Currículo , Linguagens de Programação , Brasil , COVID-19 , Educação a Distância , Humanos , Pandemias , Distanciamento FísicoRESUMO
OBJECTIVE: The aim of this study was to evaluate and compare alterations in gene expression using two distinct immortalization methods (hTERT and HPV16-E6/E7) in ameloblastoma cell lines. MATERIALS AND METHODS: A primary cell culture derived from human ameloblastoma (AME-1) was established and immortalized by two different methods using a transfection processes to hTERT and HPV-E6/E7. The RNA-seq was used to verify which immortalization method had less influence on gene expression. It was performed in four steps: extraction and collection of mRNA, PCR amplification, comparison with the human reference genome, and analysis of differential expression. The genes with differentiated expression were identified and mapped. RESULTS: RNA-seq revealed genetic alterations in ameloblastoma cell lines after the immortalization process, including increased expression of tumor genes like MYC, E2F1, BRAF, HRAS, and HTERT, and a decrease in tumor suppressor genes like P53, P21, and Rb. CONCLUSIONS: It is possible to affirm that cell immortalization is not an inert method regarding gene regulation mechanisms and the hTERT method (AME-TERT) presented fewer changes in gene expression levels.
Assuntos
Ameloblastoma , Proteínas Oncogênicas Virais , Humanos , Ameloblastoma/genética , Linhagem Celular , Transformação Celular Viral/genética , Expressão Gênica , Proteínas Oncogênicas Virais/genética , Proteínas Oncogênicas Virais/metabolismo , Papillomaviridae/genética , Proteínas E7 de Papillomavirus/genética , Proteínas Proto-Oncogênicas B-raf/genética , RNA Mensageiro , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/metabolismoRESUMO
Mycoplasma genitalium is an obligate intracellular bacterium that is responsible for several sexually transmitted infections, including non-gonococcal urethritis in men and several inflammatory reproductive tract syndromes in women. Here, we applied subtractive genomics and reverse vaccinology approaches for in silico prediction of potential vaccine and drug targets against five strains of M. genitalium. We identified 403 genes shared by all five strains, from which 104 non-host homologous proteins were selected, comprising of 44 exposed/secreted/membrane proteins and 60 cytoplasmic proteins. Based on the essentiality, functionality, and structure-based binding affinity, we finally predicted 19 (14 novel) putative vaccine and 7 (2 novel) candidate drug targets. The docking analysis showed six molecules from the ZINC database as promising drug candidates against the identified targets. Altogether, both vaccine candidates and drug targets identified here may contribute to the future development of therapeutic strategies to control the spread of M. genitalium worldwide.
Assuntos
Mycoplasma genitalium , Vacinas , Feminino , Genômica , Humanos , Masculino , Mycoplasma genitalium/genética , VacinologiaRESUMO
Corynebacterium ulcerans is an emerging pathogen able to transmit the acute infection diphtheria to humans. Although there is a well-established vaccine based on the toxin produced by Corynebacterium diphtheriae, another species of this genus known to cause the disease, there is still no vaccine formulations described for C. ulcerans; this fact contributes to the increase in cases of infection that has been observed. In this study, we want to provide information at the genomic level of this bacterium in order to suggest proteins as possible vaccine targets. We carried out an in silico prospection of vaccine candidates through reverse vaccinology for targets that exhibit antigenic potential against diphtheria. We found important virulence factors, such as adhesion-related ones, that are responsible for pathogen-host interaction after infection, but we did not find the diphtheria toxin, which is the main component of the currently available vaccine. This study provides detailed information about the exoproteome and hypothetical proteins from the core genome of C. ulcerans, suggesting vaccine targets to be further tested in vitro for the development of a new vaccine against diphtheria.
Assuntos
Infecções por Corynebacterium , Difteria , Vacinas , Corynebacterium/genética , Infecções por Corynebacterium/prevenção & controle , Difteria/prevenção & controle , Toxina Diftérica/genética , Humanos , VirulênciaRESUMO
Alzheimer's disease (AD) and Parkinson's disease (PD) are the most common neurodegenerative disorders related to aging. Though several risk factors are shared between these two diseases, the exact relationship between them is still unknown. In this paper, we analyzed how these two diseases relate to each other from the genomic, epigenomic, and transcriptomic viewpoints. Using an extensive literature mining, we first accumulated the list of genes from major genome-wide association (GWAS) studies. Based on these GWAS studies, we observed that only one gene (HLA-DRB5) was shared between AD and PD. A subsequent literature search identified a few other genes involved in these two diseases, among which SIRT1 seemed to be the most prominent one. While we listed all the miRNAs that have been previously reported for AD and PD separately, we found only 15 different miRNAs that were reported in both diseases. In order to get better insights, we predicted the gene co-expression network for both AD and PD using network analysis algorithms applied to two GEO datasets. The network analysis revealed six clusters of genes related to AD and four clusters of genes related to PD; however, there was very low functional similarity between these clusters, pointing to insignificant similarity between AD and PD even at the level of affected biological processes. Finally, we postulated the putative epigenetic regulator modules that are common to AD and PD.
Assuntos
Doença de Alzheimer/genética , Predisposição Genética para Doença , Doença de Parkinson/genética , Redes Reguladoras de Genes , Cadeias HLA-DRB5/genética , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Sirtuína 1/genéticaRESUMO
Biochemical tests are traditionally used for bacterial identification at the species level in clinical microbiology laboratories. While biochemical profiles are generally efficient for the identification of the most important corynebacterial pathogen Corynebacterium diphtheriae, their ability to differentiate between biovars of this bacterium is still controversial. Besides, the unambiguous identification of emerging human pathogenic species of the genus Corynebacterium may be hampered by highly variable biochemical profiles commonly reported for these species, including Corynebacterium striatum, Corynebacterium amycolatum, Corynebacterium minutissimum, and Corynebacterium xerosis. In order to identify the genomic basis contributing for the biochemical variabilities observed in phenotypic identification methods of these bacteria, we combined a comprehensive literature review with a bioinformatics approach based on reconstruction of six specific biochemical reactions/pathways in 33 recently released whole genome sequences. We used data retrieved from curated databases (MetaCyc, PathoSystems Resource Integration Center (PATRIC), The SEED, TransportDB, UniProtKB) associated with homology searches by BLAST and profile Hidden Markov Models (HMMs) to detect enzymes participating in the various pathways and performed ab initio protein structure modeling and molecular docking to confirm specific results. We found a differential distribution among the various strains of genes that code for some important enzymes, such as beta-phosphoglucomutase and fructokinase, and also for individual components of carbohydrate transport systems, including the fructose-specific phosphoenolpyruvate-dependent sugar phosphotransferase (PTS) and the ribose-specific ATP-binging cassette (ABC) transporter. Horizontal gene transfer plays a role in the biochemical variability of the isolates, as some genes needed for sucrose fermentation were seen to be present in genomic islands. Noteworthy, using profile HMMs, we identified an enzyme with putative alpha-1,6-glycosidase activity only in some specific strains of C. diphtheriae and this may aid to understanding of the differential abilities to utilize glycogen and starch between the biovars.
Assuntos
Proteínas de Bactérias/genética , Técnicas de Tipagem Bacteriana/métodos , Corynebacterium/genética , Genoma Bacteriano , Transportadores de Cassetes de Ligação de ATP/genética , Corynebacterium/classificação , Corynebacterium/metabolismo , Frutoquinases/genética , Sistema Fosfotransferase de Açúcar do Fosfoenolpiruvato/genética , Fosfoglucomutase/genética , Filogenia , Polimorfismo GenéticoRESUMO
Multidrug-resistant (MDR) Corynebacterium striatum has been cited with increased frequency as pathogen of nosocomial infections. In this study, we report the draft genome of a C. striatum isolated from a patient with bloodstream infection in a hospital of Rio de Janeiro, Brazil. The isolate presented susceptibility only to tetracycline, vancomycin and linezolid. The detection of various antibiotic resistance genes is fully consistent with previously observed multidrug-resistant pattern in Corynebacterium spp. A large part of the pTP10 plasmid of MDR C. striatum M82B is present in the genome of our isolate. A SpaDEF cluster and seven arrays of CRISPR-Cas were found.
Assuntos
Bacteriemia/microbiologia , Infecções por Corynebacterium/microbiologia , Corynebacterium/efeitos dos fármacos , Corynebacterium/genética , Infecção Hospitalar/microbiologia , Farmacorresistência Bacteriana Múltipla/genética , Genoma Bacteriano/genética , Antibacterianos/farmacologia , Brasil , Corynebacterium/isolamento & purificação , Surtos de Doenças , Farmacorresistência Bacteriana Múltipla/efeitos dos fármacos , Eletroforese em Gel de Campo Pulsado , Genótipo , Humanos , Testes de Sensibilidade Microbiana , Análise de Sequência de DNARESUMO
A previous study by our group reported the isolation and characterisation of Leptospira borgpetersenii serogroup Ballum strain 4E. This strain is of particular interest because it is highly virulent in the hamster model. In this study, we performed whole-genome shotgun genome sequencing of the strain using the SOLiD sequencing platform. By assembling and analysing the new genome, we were able to identify novel features that have been previously overlooked in genome annotations of other strains belonging to the same species.
Assuntos
Leptospira/genética , Leptospira/patogenicidade , Virulência/genética , Animais , Leptospira/classificação , CamundongosRESUMO
BACKGROUND: The evolution of Next-Generation Sequencing (NGS) has considerably reduced the cost per sequenced-base, allowing a significant rise of sequencing projects, mainly in prokaryotes. However, the range of available NGS platforms requires different strategies and software to correctly assemble genomes. Different strategies are necessary to properly complete an assembly project, in addition to the installation or modification of various software. This requires users to have significant expertise in these software and command line scripting experience on Unix platforms, besides possessing the basic expertise on methodologies and techniques for genome assembly. These difficulties often delay the complete genome assembly projects. RESULTS: In order to overcome this, we developed SIMBA (SImple Manager for Bacterial Assemblies), a freely available web tool that integrates several component tools for assembling and finishing bacterial genomes. SIMBA provides a friendly and intuitive user interface so bioinformaticians, even with low computational expertise, can work under a centralized administrative control system of assemblies managed by the assembly center head. SIMBA guides the users to execute assembly process through simple and interactive pages. SIMBA workflow was divided in three modules: (i) projects: allows a general vision of genome sequencing projects, in addition to data quality analysis and data format conversions; (ii) assemblies: allows de novo assemblies with the software Mira, Minia, Newbler and SPAdes, also assembly quality validations using QUAST software; and (iii) curation: presents methods to finishing assemblies through tools for scaffolding contigs and close gaps. We also presented a case study that validated the efficacy of SIMBA to manage bacterial assemblies projects sequenced using Ion Torrent PGM. CONCLUSION: Besides to be a web tool for genome assembly, SIMBA is a complete genome assemblies project management system, which can be useful for managing of several projects in laboratories. SIMBA source code is available to download and install in local webservers at http://ufmg-simba.sourceforge.net .
Assuntos
Bactérias/genética , Biologia Computacional/métodos , Mineração de Dados/métodos , Genoma Bacteriano , Bactérias/classificação , Bactérias/isolamento & purificação , Sequência de Bases , Mapeamento Cromossômico , Biologia Computacional/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Análise de Sequência de DNA , SoftwareRESUMO
BACKGROUND: Studies have detected mis-assemblies in genomes of the species Corynebacterium pseudotuberculosis. These new discover have been possible due to the evolution of the Next-Generation Sequencing platforms, which have provided sequencing with accuracy and reduced costs. In addition, the improving of techniques for construction of high accuracy genomic maps, for example, Whole-genome mapping (WGM) (OpGen Inc), have allow high-resolution assembly that can detect large rearrangements. RESULTS: In this work, we present the resequencing of Corynebacterium pseudotuberculosis strain 1002 (Cp1002). Cp1002 was the first strain of this species sequenced in Brazil, and its genome has been used as model for several studies in silico of caseous lymphadenitis disease. The sequencing was performed using the platform Ion PGM and fragment library (200 bp kit). A restriction map was constructed, using the technique of WGM with the enzyme KpnI. After the new assembly process, using WGM as scaffolder, we detected a large inversion with size bigger than one-half of genome. A specific analysis using BLAST and NR database shows that the inversion occurs between two homology RNA ribosomal regions. CONCLUSION: In conclusion, the results showed by WGM could be used to detect mismatches in assemblies, providing genomic maps with high resolution and allow assemblies with more accuracy and completeness. The new assembly of C. pseudotuberculosis was deposited in GenBank under the accession no. CP012837.
Assuntos
Mapeamento Cromossômico/métodos , Corynebacterium pseudotuberculosis/genética , Genoma Bacteriano , Genômica/métodos , Óperon de RNAr/genética , DNA Bacteriano/genética , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNARESUMO
BACKGROUND: Corynebacterium urealyticum is an opportunistic pathogen that normally lives on skin and mucous membranes in humans. This high Gram-positive bacteria can cause acute or encrusted cystitis, encrusted pyelitis, and pyelonephritis in immunocompromised patients. The bacteria is multi-drug resistant, and knowledge about the genes that contribute to its virulence is very limited. Two complete genome sequences were used in this comparative genomic study: C. urealyticum DSM 7109 and C. urealyticum DSM 7111. RESULTS: We used comparative genomics strategies to compare the two strains, DSM 7109 and DSM 7111, and to analyze their metabolic pathways, genome plasticity, and to predict putative antigenic targets. The genomes of these two strains together encode 2,115 non-redundant coding sequences, 1,823 of which are common to both genomes. We identified 188 strain-specific genes in DSM 7109 and 104 strain-specific genes in DSM 7111. The high number of strain-specific genes may be a result of horizontal gene transfer triggered by the large number of transposons in the genomes of these two strains. Screening for virulence factors revealed the presence of the spaDEF operon that encodes pili forming proteins. Therefore, spaDEF may play a pivotal role in facilitating the adhesion of the pathogen to the host tissue. Application of the reverse vaccinology method revealed 19 putative antigenic proteins that may be used in future studies as candidate drug or vaccine targets. CONCLUSIONS: The genome features and the presence of virulence factors in genomic islands in the two strains of C. urealyticum provide insights in the lifestyle of this opportunistic pathogen and may be useful in developing future therapeutic strategies.
Assuntos
Hibridização Genômica Comparativa/métodos , Biologia Computacional/métodos , Corynebacterium/genética , Genoma Bacteriano/genética , Fatores de Virulência/genética , Corynebacterium/classificação , Corynebacterium/patogenicidade , Infecções por Corynebacterium/tratamento farmacológico , Infecções por Corynebacterium/microbiologia , Farmacorresistência Bacteriana Múltipla/genética , Genômica/métodos , HumanosRESUMO
BACKGROUND: MicroRNAs (miRNAs) have increasingly been found to regulate diseases at a significant level. The interaction of miRNA and diseases is a complex web of multilevel interactions, given the fact that a miRNA regulates upto 50 or more diseases and miRNAs/diseases work in clusters. The clear patterns of miRNA regulations in a disease are still elusive. METHODS: In this work, we approach the miRNA-disease interactions from a network scientific perspective and devise two approaches - maximum weighted matching model (a graph theoretical algorithm which provides the result by solving an optimization equation of selecting the most prominent set of diseases) and motif-based analyses (which investigates the motifs of the miRNA-disease network and selects the most prominent set of diseases based on their maximum number of participation in motifs, thereby revealing the miRNA-disease interaction dynamics) to determine and prioritize the set of diseases which are most certainly impacted upon the activation of a group of queried miRNAs, in a miRNA-disease network. RESULTS AND CONCLUSION: Our tool, DISMIRA implements the above mentioned approaches and presents an interactive visualization which helps the user in exploring the networking dynamics of miRNAs and diseases by analyzing their neighbors, paths and topological features. A set of miRNAs can be used in this analysis to get the associated diseases for the input group of miRs with ranks and also further analysis can be done to find key miRs or diseases, shortest paths etc. DISMIRA can be accessed online for free at http://bnet.egr.vcu.edu:8080/dismira.
Assuntos
Redes Reguladoras de Genes/genética , Predisposição Genética para Doença/genética , MicroRNAs/genética , Algoritmos , Bases de Dados Factuais , HumanosRESUMO
The number of genomes that have been deposited in databases has increased exponentially after the advent of Next-Generation Sequencing (NGS), which produces high-throughput sequence data; this circumstance has demanded the development of new bioinformatics software and the creation of new areas, such as comparative genomics. In comparative genomics, the genetic content of an organism is compared against other organisms, which helps in the prediction of gene function and coding region sequences, identification of evolutionary events and determination of phylogenetic relationships. However, expanding comparative genomics to a large number of related bacteria, we can infer their lifestyles, gene repertoires and minimal genome size. In this context, a powerful approach called Pan-genome has been initiated and developed. This approach involves the genomic comparison of different strains of the same species, or even genus. Its main goal is to establish the total number of non-redundant genes that are present in a determined dataset. Pan-genome consists of three parts: core genome; accessory or dispensable genome; and species-specific or strain-specific genes. Furthermore, pan-genome is considered to be "open" as long as new genes are added significantly to the total repertoire for each new additional genome and "closed" when the newly added genomes cannot be inferred to significantly increase the total repertoire of the genes. To perform all of the required calculations, a substantial amount of software has been developed, based on orthologous and paralogous gene identification.
RESUMO
BACKGROUND: The completion of whole-genome sequencing for Corynebacterium pseudotuberculosis strain 1002 has contributed to major advances in research aimed at understanding the biology of this microorganism. This bacterium causes significant loss to goat and sheep farmers because it is the causal agent of the infectious disease caseous lymphadenitis, which may lead to outcomes ranging from skin injury to animal death. In the current study, we simulated the conditions experienced by the bacteria during host infection. By sequencing transcripts using the SOLiDTM 3 Plus platform, we identified new targets expected to potentiate the survival and replication of the pathogen in adverse environments. These results may also identify possible candidates useful for the development of vaccines, diagnostic kits or therapies aimed at the reduction of losses in agribusiness. RESULTS: Under the 3 simulated conditions (acid, osmotic and thermal shock stresses), 474 differentially expressed genes exhibiting at least a 2-fold change in expression levels were identified. Important genes to the infection process were induced, such as those involved in virulence, defence against oxidative stress, adhesion and regulation, and many genes encoded hypothetical proteins, indicating that further investigation of the bacterium is necessary. The data will contribute to a better understanding of the biology of C. pseudotuberculosis and to studies investigating strategies to control the disease. CONCLUSIONS: Despite the veterinary importance of C. pseudotuberculosis, the bacterium is poorly characterised; therefore, effective treatments for caseous lymphadenitis have been difficult to establish. Through the use of RNAseq, these results provide a better biological understanding of this bacterium, shed light on the most likely survival mechanisms used by this microorganism in adverse environments and identify candidates that may help reduce or even eradicate the problems caused by this disease.
Assuntos
Corynebacterium pseudotuberculosis/genética , Genes Bacterianos , Estresse Fisiológico , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Corynebacterium pseudotuberculosis/metabolismo , Regulação para Baixo , Concentração de Íons de Hidrogênio , Pressão Osmótica , RNA não Traduzido/metabolismo , Análise de Sequência de DNA , Fator sigma/genética , Fator sigma/metabolismo , Temperatura , Regulação para CimaRESUMO
BACKGROUND: Exiguobacterium antarcticum strain B7 is a Gram-positive psychrotrophic bacterial species isolated in Antarctica. Although this bacteria has been poorly studied, its genome has already been sequenced. Therefore, it is an appropriate model for the study of thermal adaptation. In the present study, we analyzed the transcriptomes and proteomes of E. antarcticum B7 grown at 0°C and 37°C by SOLiD RNA-Seq, Ion Torrent RNA-Seq and two-dimensional difference gel electrophoresis tandem mass spectrometry (2D-DIGE-MS/MS). RESULTS: We found expression of 2,058 transcripts in all replicates from both platforms and differential expression of 564 genes (absolute log2FC≥1, P-value<0.001) comparing the two temperatures by RNA-Seq. A total of 73 spots were differentially expressed between the two temperatures on 2D-DIGE, 25 of which were identified by MS/MS. Some proteins exhibited patterns of dispersion in the gel that are characteristic of post-translational modifications. CONCLUSIONS: Our findings suggest that the two sequencing platforms yielded similar results and that different omic approaches may be used to improve the understanding of gene expression. To adapt to low temperatures, E. antarcticum B7 expresses four of the six cold-shock proteins present in its genome. The cold-shock proteins were the most abundant in the bacterial proteome at 0°C. Some of the differentially expressed genes are required to preserve transcription and translation, while others encode proteins that contribute to the maintenance of the intracellular environment and appropriate protein folding. The results denote the complexity intrinsic to the adaptation of psychrotrophic organisms to cold environments and are based on two omic approaches. They also unveil the lifestyle of a bacterial species isolated in Antarctica.