Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Comput Biol Med ; 178: 108789, 2024 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-38936077

RESUMO

Alternative Splicing (AS) is an essential mechanism for eukaryotes. However, the consequences of deleting a single exon can be dramatic for the organism and can lead to cancer in humans. Additionally, alternative 5' and 3' splice sites, which define the boundaries of exons, also play key roles to human disorders. Therefore, Investigating AS events is crucial for understanding the molecular basis of human diseases and developing therapeutic strategies. Workflow for AS event analysis can be sampling followed by data analysis with bioinformatics to identify the different AS events in the control and case samples, data visualization for curation, and selection of relevant targets for experimental validation. The raw output of the analysis software does not favor the inspection of events by bioinformaticians requiring custom scripts for data visualization. In this work, we propose the Geneapp application with three modules: GeneappScript, GeneappServer, and GeneappExplorer. GeneappScript is a wrapper that assists in identifying AS in samples compared in two different approaches, while GeneappServer integrates data from AS analysis already performed by the user. In GeneappExplorer, the user visualizes the previous dataset by exploring AS events in genes with functional annotation. This targeted screens that Geneapp allows to perform helps in the identification of targets for experimental validation to confirm the hypotheses under study. The Geneapp is freely available for non-commercial use at https://geneapp.net to advance research on AS for bioinformatics.

2.
Sci Rep ; 14(1): 11576, 2024 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-38773133

RESUMO

Despite presenting a worse prognosis and being associated with highly aggressive tumors, triple-negative breast cancer (TNBC) is characterized by the higher frequency of tumor-infiltrating lymphocytes, which have been implicated in better overall survival and response to therapy. Though recent studies have reported the capacity of B lymphocytes to recognize overly-expressed normal proteins, and tumor-associated antigens, how tumor development potentially modifies B cell response is yet to be elucidated. Our findings reveal distinct effects of 4T1 and E0771 murine tumor development on B cells in secondary lymphoid organs. Notably, we observe a significant expansion of total B cells and plasma cells in the tumor-draining lymph nodes (tDLNs) as early as 7 days after tumor challenge in both murine models, whereas changes in the spleen are less pronounced. Surprisingly, within the tumor microenvironment (TME) of both models, we detect distinct B cell subpopulations, but tumor development does not appear to cause major alterations in their frequency over time. Furthermore, our investigation into B cell regulatory phenotypes highlights that the B10 Breg phenotype remains unaffected in the evaluated tissues. Most importantly, we identified an increase in CD19 + LAG-3 + cells in tDLNs of both murine models. Interestingly, although CD19 + LAG-3 + cells represent a minor subset of total B cells (< 3%) in all evaluated tissues, most of these cells exhibit elevated expression of IgD, suggesting that LAG-3 may serve as an activation marker for B cells. Corroborating with these findings, we detected distinct cell cycle and proliferation genes alongside LAG-3 analyzing scRNA-Seq data from a cohort of TNBC patients. More importantly, our study suggests that the presence of LAG-3 B cells in breast tumors could be associated with a good prognosis, as patients with higher levels of LAG-3 B cell transcripts had a longer progression-free interval (PFI). This novel insight could pave the way for targeted therapies that harness the unique properties of LAG-3 + B cells, potentially offering new avenues for improving patient outcomes in TNBC. Further research is warranted to unravel the mechanistic pathways of these cells and to validate their prognostic value in larger, diverse patient cohorts.


Assuntos
Neoplasias de Mama Triplo Negativas , Microambiente Tumoral , Animais , Neoplasias de Mama Triplo Negativas/patologia , Neoplasias de Mama Triplo Negativas/imunologia , Neoplasias de Mama Triplo Negativas/metabolismo , Neoplasias de Mama Triplo Negativas/genética , Feminino , Camundongos , Microambiente Tumoral/imunologia , Linfócitos do Interstício Tumoral/imunologia , Linfócitos do Interstício Tumoral/metabolismo , Linhagem Celular Tumoral , Proteína do Gene 3 de Ativação de Linfócitos , Subpopulações de Linfócitos B/imunologia , Subpopulações de Linfócitos B/metabolismo , Antígenos CD/metabolismo , Linfócitos B/imunologia , Linfócitos B/metabolismo , Linfonodos/patologia , Baço/imunologia , Baço/metabolismo , Baço/patologia , Camundongos Endogâmicos BALB C
3.
Phys Med ; 112: 102622, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37331081

RESUMO

PURPOSE: This study presents a treatment planning system for intraoperative low-energy photon radiotherapy based on photogrammetry from real images of the surgical site taken in the operating room. MATERIAL AND METHODS: The study population comprised 15 patients with soft-tissue sarcoma. The system obtains the images of the area to be irradiated with a smartphone or tablet, so that the absorbed doses in the tissue can be calculated from the reconstruction without the need for computed tomography. The system was commissioned using 3D printing of the reconstructions of the tumor beds. The absorbed doses at various points were verified using radiochromic films that were suitably calibrated for the corresponding energy and beam quality. RESULTS: The average reconstruction time of the 3D model from the video sequence in the 15 patients was 229,6±7,0 s. The entire procedure, including video capture, reconstruction, planning, and dose calculation was 520,6±39,9 s. Absorbed doses were measured on the 3D printed model with radiochromic film, the differences between these measurements and those calculated by the treatment planning system were 1.4% at the applicator surface, 2.6% at 1 cm, 3.9% at 2 cm and 6.2% at 3 cm. CONCLUSIONS: The study shows a photogrammetry-based low-energy photon IORT planning system, capable of obtaining real-time images inside the operating room, immediately after removal of the tumor and immediately before irradiation. The system was commissioned with radiochromic films measurements in 3D-printed model.


Assuntos
Braquiterapia , Sarcoma , Humanos , Dosagem Radioterapêutica , Braquiterapia/métodos , Planejamento da Radioterapia Assistida por Computador/métodos , Imagens de Fantasmas , Fotogrametria
4.
PeerJ ; 11: e14616, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36643652

RESUMO

Background: In metabarcoding analyses, the taxonomic assignment is crucial to place sequencing data in biological and ecological contexts. This fundamental step depends on a reference database, which should have a good taxonomic coverage to avoid unassigned sequences. However, this goal is rarely achieved in many geographic regions and for several taxonomic groups. On the other hand, more is not necessarily better, as sequences in reference databases belonging to taxonomic groups out of the studied region/environment context might lead to false assignments. Methods: We investigated the effect of using several subsets of a cytochrome c oxidase subunit I (COI) reference database on taxonomic assignment. Published metabarcoding sequences from the Mediterranean Sea were assigned to taxa using COInr, which is a comprehensive, non-redundant and recent database of COI sequences obtained both from BOLD and NCBI, and two of its subsets: (i) all sequences except insects (COInr-WO-Insecta), which represent the overwhelming majority of COInr database, but are irrelevant for marine samples, and (ii) all sequences from taxonomic families present in the Mediterranean Sea (COInr-Med). Four different algorithms for taxonomic assignment were employed in parallel to evaluate differences in their output and data consistency. Results: The reduction of the database to more specific custom subsets increased the number of unassigned sequences. Nevertheless, since most of them were incorrectly assigned by the less specific databases, this is a positive outcome. Moreover, the taxonomic resolution (the lowest taxonomic level to which a sequence is attributed) of several sequences tended to increase when using customized databases. These findings clearly indicated the need for customized databases adapted to each study. However, the very high proportion of unassigned sequences points to the need to enrich the local database with new barcodes specifically obtained from the studied region and/or taxonomic group. Including novel local barcodes to the COI database proved to be very profitable: by adding only 116 new barcodes sequenced in our laboratory, thus increasing the reference database by only 0.04%, we were able to improve the resolution for ca. 0.6-1% of the Amplicon Sequence Variants (ASVs).


Assuntos
Organismos Aquáticos , Código de Barras de DNA Taxonômico , Bases de Dados Factuais , Mar Mediterrâneo , Organismos Aquáticos/genética
5.
Front Bioinform ; 1: 711463, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-36303729

RESUMO

Bioinformatics is a fast-evolving research field, requiring effective educational initiatives to bring computational knowledge to Life Sciences. Since 2017, an organizing committee composed of graduate students and postdoctoral researchers from the Universidade Federal de Minas Gerais (Brazil) promotes a week-long event named Summer Course in Bioinformatics (CVBioinfo). This event aims to diffuse bioinformatic principles, news, and methods mainly focused on audiences of undergraduate students. Furthermore, as the advent of the COVID-19 global pandemic has precluded in-person events, we offered the event in online mode, using free video transmission platforms. Herein, we present and discuss the insights obtained from promoting the Online Workshop in Bioinformatics (WOB) organized in November 2020, comparing it to our experience in previous in-person editions of the same event.

6.
Biochim Biophys Acta Gene Regul Mech ; 1863(6): 194472, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-31825805

RESUMO

Eukaryotic regulons are regulatory units formed by a set of genes under the control of the same transcription factor (TF). Despite the functional plasticity, TFs are highly conserved and recognize the same DNA sequences in different organisms. One of the main factors that confer regulatory specificity is the distribution of the binding sites of the TFs along the genome, allowing the configuration of different transcriptional regulatory networks (TRNs) from the same regulator. A similar scenario occurs between tissues of the same organism, where a TRN can be rewired by epigenetic factors, modulating the accessibility of the TF to its binding sites. In this article we discuss concepts that can help to formulate testable hypotheses about the construction of regulons, exploring the presence and absence of the elements that form a TRN throughout the evolution of an ancestral lineage. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.


Assuntos
Eucariotos/genética , Evolução Molecular , Redes Reguladoras de Genes , Regulon , Fatores de Transcrição/metabolismo
7.
Gene ; 726: 144168, 2020 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-31759986

RESUMO

Methods based around statistics and linear algebra have been increasingly used in attempts to address emerging questions in microarray literature. Microarray technology is a long-used tool in the global analysis of gene expression, allowing for the simultaneous investigation of hundreds or thousands of genes in a sample. It is characterized by a low sample size and a large feature number created a non-square matrix, and by the incomplete rank, that can generate countless more solution in classifiers. To avoid the problem of the 'curse of dimensionality' many authors have performed feature selection or reduced the size of data matrix. In this work, we introduce a new logistic regression-based model to classify breast cancer tumor samples based on microarray expression data, including all features of gene expression and without reducing the microarray data matrix. If the user still deems it necessary to perform feature reduction, it can be done after the application of the methodology, still maintaining a good classification. This methodology allowed the correct classification of breast cancer sample data sets from Gene Expression Omnibus (GEO) data series GSE65194, GSE20711, and GSE25055, which contain the microarray data of said breast cancer samples. Classification had a minimum performance of 80% (sensitivity and specificity), and explored all possible data combinations, including breast cancer subtypes. This methodology highlighted genes not yet studied in breast cancer, some of which have been observed in Gene Regulatory Networks (GRNs). In this work we examine the patterns and features of a GRN composed of transcription factors (TFs) in MCF-7 breast cancer cell lines, providing valuable information regarding breast cancer. In particular, some genes whose αi ∗ associated parameter values revealed extreme positive and negative values, and, as such, can be identified as breast cancer prediction genes. We indicate that the PKN2, MKL1, MED23, CUL5 and GLI genes demonstrate a tumor suppressor profile, and that the MTR, ITGA2B, TELO2, MRPL9, MTTL1, WIPI1, KLHL20, PI4KB, FOLR1 and SHC1 genes demonstrate an oncogenic profile. We propose that these may serve as potential breast cancer prediction genes, and should be prioritized for further clinical studies on breast cancer. This new model allows for the assignment of values to the αi ∗ parameters associated with gene expression. It was noted that some αi ∗ parameters are associated with genes previously described as breast cancer biomarkers, as well as other genes not yet studied in relation to this disease.


Assuntos
Neoplasias da Mama/genética , Regulação Neoplásica da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Biomarcadores Tumorais/genética , Linhagem Celular Tumoral , Progressão da Doença , Feminino , Perfilação da Expressão Gênica/métodos , Humanos , Modelos Logísticos , Células MCF-7 , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Fatores de Transcrição/genética
8.
BMC Vet Res ; 13(1): 177, 2017 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-28619055

RESUMO

BACKGROUND: Leptospirosis is caused by pathogenic spirochetes of the genus Leptospira spp. This zoonotic disease is distributed globally and affects domestic animals, including cattle. Leptospira interrogans serogroup Sejroe serovar Hardjo and Leptospira borgpetersenii serogroup Sejroe serovar Hardjo remain important species associated with this reproductive disease in livestock production. Previous studies on Brazilian livestock have reported that L. interrogans serovar Hardjo is the most prevalent leptospiral agent in this country and is related to clinical signs of leptospirosis, which lead to economic losses in production. Here, we described the isolation of three clinical strains (Norma, Lagoa and Bolivia) obtained from leptospirosis outbreaks that occurred in Minas Gerais state in 1994 and 2008. RESULTS: Serological and molecular typing using housekeeping (secY and 16SrRNA) and rfb locus (ORF22 and ORF36) genes were applied for the identification and comparative analysis of Leptospira spp. Our results identified the three isolates as L. interrogans serogroup Sejroe serovar Hardjo and confirmed the occurrence of this bacterial strain in Brazilian livestock. Genetic analysis using ORF22 and ORF36 grouped the Leptospira into serogroup Sejroe and subtype Hardjoprajitno. Genetic approaches were also applied to compare distinct serovars of L. interrogans strains by verifying the copy numbers of the IS1500 and IS1533 insertion sequences (ISs). The IS1500 copy number varied among the analyzed L. interrogans strains. CONCLUSION: This study provides evidence that L. interrogans serogroup Sejroe serovar Hardjo subtype Hardjoprajitno causes bovine leptospirosis in Brazilian production. The molecular results suggested that rfb locus (ORF22 and ORF36) could improve epidemiological studies by allowing the identification of Leptospira spp. at the serogroup level. Additionally, the IS1500 and IS1533 IS copy number analysis suggested distinct genomic features among closely related leptospiral strains.


Assuntos
Doenças dos Bovinos/microbiologia , Surtos de Doenças/veterinária , Leptospira interrogans/isolamento & purificação , Leptospirose/veterinária , Animais , Brasil/epidemiologia , Bovinos , Doenças dos Bovinos/epidemiologia , Elementos de DNA Transponíveis , DNA Bacteriano , DNA Ribossômico , Genes Bacterianos , Loci Gênicos , Leptospira interrogans/classificação , Leptospira interrogans/genética , Leptospirose/epidemiologia , Leptospirose/microbiologia , Tipagem Molecular , Fases de Leitura Aberta
9.
PLoS One ; 12(4): e0175041, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28376104

RESUMO

Increases in nuclear calcium concentration generate specific biological outcomes that differ from those resulting from increased cytoplasmic calcium. Nuclear calcium effects on tumor cell proliferation are widely appreciated; nevertheless, its involvement in other steps of tumor progression is not well understood. Therefore, we evaluated whether nuclear calcium is essential in other additional stages of tumor progression, including key steps associated with the formation of the primary tumor or with the metastatic cascade. We found that nuclear calcium buffering impaired 4T1 triple negative breast cancer growth not just by decreasing tumor cell proliferation, but also by enhancing tumor necrosis. Moreover, nuclear calcium regulates tumor angiogenesis through a mechanism that involves the upregulation of the anti-angiogenic C-X-C motif chemokine 10 (CXCL10-IP10). In addition, nuclear calcium buffering regulates breast tumor cell motility, culminating in less cell invasion, likely due to enhanced vinculin expression, a focal adhesion structural protein. Together, our results show that nuclear calcium is essential for triple breast cancer angiogenesis and cell migration and can be considered as a promising strategic target for triple negative breast cancer therapy.


Assuntos
Sinalização do Cálcio , Inositol 1,4,5-Trifosfato/metabolismo , Neoplasias de Mama Triplo Negativas/metabolismo , Animais , Linhagem Celular Tumoral , Movimento Celular , Núcleo Celular/metabolismo , Proliferação de Células , Quimiocina CXCL10/metabolismo , Feminino , Regulação Neoplásica da Expressão Gênica , Xenoenxertos , Humanos , Camundongos , Camundongos Endogâmicos BALB C , Neovascularização Patológica/genética , Neoplasias de Mama Triplo Negativas/irrigação sanguínea , Neoplasias de Mama Triplo Negativas/patologia
10.
BMC Genomics ; 17(Suppl 8): 725, 2016 10 25.
Artigo em Inglês | MEDLINE | ID: mdl-27801289

RESUMO

BACKGROUND: The development of large-scale technologies for quantitative transcriptomics has enabled comprehensive analysis of the gene expression profiles in complete genomes. RNA-Seq allows the measurement of gene expression levels in a manner far more precise and global than previous methods. Studies using this technology are altering our view about the extent and complexity of the eukaryotic transcriptomes. In this respect, multiple efforts have been done to determine and analyse the gene expression patterns of human cell types in different conditions, either in normal or pathological states. However, until recently, little has been reported about the evolutionary marks present in human protein-coding genes, particularly from the combined perspective of gene expression and protein evolution. RESULTS: We present a combined analysis of human protein-coding gene expression profiling and time-scale ancestry mapping, that places the genes in taxonomy clades and reveals eight evolutionary major steps ("hallmarks"), that include clusters of functionally coherent proteins. The human expressed genes are analysed using a RNA-Seq dataset of 116 samples from 32 tissues. The evolutionary analysis of the human proteins is performed combining the information from: (i) a database of orthologous proteins (OMA), (ii) the taxonomy mapping of genes to lineage clades (from NCBI Taxonomy) and (iii) the evolution time-scale mapping provided by TimeTree (Timescale of Life). The human protein-coding genes are also placed in a relational context based in the construction of a robust gene coexpression network, that reveals tighter links between age-related protein-coding genes and finds functionally coherent gene modules. CONCLUSIONS: Understanding the relational landscape of the human protein-coding genes is essential for interpreting the functional elements and modules of our active genome. Moreover, decoding the evolutionary history of the human genes can provide very valuable information to reveal or uncover their origin and function.


Assuntos
Evolução Molecular , Proteoma , Proteômica , Análise por Conglomerados , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Fases de Leitura Aberta , Especificidade de Órgãos/genética , Proteômica/métodos , Transcriptoma
11.
Nutrients ; 7(1): 1-16, 2014 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-25545100

RESUMO

Essential amino acids (EAA) consist of a group of nine amino acids that animals are unable to synthesize via de novo pathways. Recently, it has been found that most metazoans lack the same set of enzymes responsible for the de novo EAA biosynthesis. Here we investigate the sequence conservation and evolution of all the metazoan remaining genes for EAA pathways. Initially, the set of all 49 enzymes responsible for the EAA de novo biosynthesis in yeast was retrieved. These enzymes were used as BLAST queries to search for similar sequences in a database containing 10 complete metazoan genomes. Eight enzymes typically attributed to EAA pathways were found to be ubiquitous in metazoan genomes, suggesting a conserved functional role. In this study, we address the question of how these genes evolved after losing their pathway partners. To do this, we compared metazoan genes with their fungal and plant orthologs. Using phylogenetic analysis with maximum likelihood, we found that acetolactate synthase (ALS) and betaine-homocysteine S-methyltransferase (BHMT) diverged from the expected Tree of Life (ToL) relationships. High sequence conservation in the paraphyletic group Plant-Fungi was identified for these two genes using a newly developed Python algorithm. Selective pressure analysis of ALS and BHMT protein sequences showed higher non-synonymous mutation ratios in comparisons between metazoans/fungi and metazoans/plants, supporting the hypothesis that these two genes have undergone non-ToL evolution in animals.


Assuntos
Aminoácidos Essenciais/biossíntese , Sequência Conservada/genética , Acetolactato Sintase/genética , Acetolactato Sintase/metabolismo , Sequência de Aminoácidos , Animais , Betaína-Homocisteína S-Metiltransferase/genética , Betaína-Homocisteína S-Metiltransferase/metabolismo , Evolução Biológica , Fungos/enzimologia , Fungos/genética , Humanos , Filogenia , Plantas/enzimologia , Plantas/genética , Sacaropina Desidrogenases/genética , Sacaropina Desidrogenases/metabolismo
12.
Microb Ecol ; 67(2): 237-41, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24173537

RESUMO

The Brazilian Microbiome Project (BMP) aims to assemble a Brazilian Metagenomic Consortium/Database. At present, many metagenomic projects underway in Brazil are widely known. Our goal in this initiative is to co-ordinate and standardize these together with new projects to come. It is estimated that Brazil hosts approximately 20 % of the entire world's macroorganism biological diversity. It is 1 of the 17 countries that share nearly 70 % of the world's catalogued animal and plant species, and is recognized as one of the most megadiverse countries. At the end of 2012, Brazil has joined GBIF (Global Biodiversity Information Facility), as associated member, to improve the access to the Brazilian biodiversity data in a free and open way. This was an important step toward increasing international collaboration and clearly shows the commitment of the Brazilian government in directing national policies toward sustainable development. Despite its importance, the Brazilian microbial diversity is still considered to be largely unknown, and it is clear that to maintain ecosystem dynamics and to sustainably manage land use, it is crucial to understand the biological and functional diversity of the system. This is the first attempt to collect and collate information about Brazilian microbial genetic and functional diversity in a systematic and holistic manner. The success of the BMP depends on a massive collaborative effort of both the Brazilian and international scientific communities, and therefore, we invite all colleagues to participate in this project.


Assuntos
Comitês Consultivos/organização & administração , Biodiversidade , Metagenoma , Microbiota , Animais , Brasil , Bases de Dados Factuais , Plantas/microbiologia , Microbiologia do Solo
13.
PLoS One ; 8(11): e79240, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24223912

RESUMO

Sphingomyelinases D (SMases D) or dermonecrotic toxins are well characterized in Loxosceles spider venoms and have been described in some strains of pathogenic microorganisms, such as Corynebacterium sp. After spider bites, the SMase D molecules cause skin necrosis and occasional severe systemic manifestations, such as acute renal failure. In this paper, we identified new SMase D amino acid sequences from various organisms belonging to 24 distinct genera, of which, 19 are new. These SMases D share a conserved active site and a C-terminal motif. We suggest that the C-terminal tail is responsible for stabilizing the entire internal structure of the SMase D Tim barrel and that it can be considered an SMase D hallmark in combination with the amino acid residues from the active site. Most of these enzyme sequences were discovered from fungi and the SMase D activity was experimentally confirmed in the fungus Aspergillus flavus. Because most of these novel SMases D are from organisms that are endowed with pathogenic properties similar to those evoked by these enzymes alone, they might be associated with their pathogenic mechanisms.


Assuntos
Corynebacterium pseudotuberculosis/enzimologia , Fungos/enzimologia , Ixodes/enzimologia , Diester Fosfórico Hidrolases/metabolismo , Aranhas/enzimologia , Motivos de Aminoácidos/genética , Sequência de Aminoácidos , Animais , Proteínas de Artrópodes/química , Proteínas de Artrópodes/genética , Proteínas de Artrópodes/metabolismo , Aspergillus flavus/enzimologia , Aspergillus flavus/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Biocatálise , Domínio Catalítico , Corynebacterium pseudotuberculosis/classificação , Corynebacterium pseudotuberculosis/genética , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Fungos/classificação , Fungos/genética , Ixodes/classificação , Ixodes/genética , Modelos Moleculares , Dados de Sequência Molecular , Diester Fosfórico Hidrolases/química , Diester Fosfórico Hidrolases/genética , Filogenia , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Homologia de Sequência de Aminoácidos , Esfingomielinas/química , Esfingomielinas/metabolismo , Aranhas/classificação , Aranhas/genética
14.
PLoS One ; 6(4): e18551, 2011 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-21533164

RESUMO

BACKGROUND: Corynebacterium pseudotuberculosis, a gram-positive, facultative intracellular pathogen, is the etiologic agent of the disease known as caseous lymphadenitis (CL). CL mainly affects small ruminants, such as goats and sheep; it also causes infections in humans, though rarely. This species is distributed worldwide, but it has the most serious economic impact in Oceania, Africa and South America. Although C. pseudotuberculosis causes major health and productivity problems for livestock, little is known about the molecular basis of its pathogenicity. METHODOLOGY AND FINDINGS: We characterized two C. pseudotuberculosis genomes (Cp1002, isolated from goats; and CpC231, isolated from sheep). Analysis of the predicted genomes showed high similarity in genomic architecture, gene content and genetic order. When C. pseudotuberculosis was compared with other Corynebacterium species, it became evident that this pathogenic species has lost numerous genes, resulting in one of the smallest genomes in the genus. Other differences that could be part of the adaptation to pathogenicity include a lower GC content, of about 52%, and a reduced gene repertoire. The C. pseudotuberculosis genome also includes seven putative pathogenicity islands, which contain several classical virulence factors, including genes for fimbrial subunits, adhesion factors, iron uptake and secreted toxins. Additionally, all of the virulence factors in the islands have characteristics that indicate horizontal transfer. CONCLUSIONS: These particular genome characteristics of C. pseudotuberculosis, as well as its acquired virulence factors in pathogenicity islands, provide evidence of its lifestyle and of the pathogenicity pathways used by this pathogen in the infection process. All genomes cited in this study are available in the NCBI Genbank database (http://www.ncbi.nlm.nih.gov/genbank/) under accession numbers CP001809 and CP001829.


Assuntos
Corynebacterium pseudotuberculosis/patogenicidade , Evolução Molecular , Genoma Bacteriano , Virulência/genética , Corynebacterium pseudotuberculosis/genética
15.
BMC Genomics ; 12 Suppl 4: S9, 2011 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-22369295

RESUMO

BACKGROUND: The accurate prediction of the initiation of translation in sequences of mRNA is an important activity for genome annotation. However, obtaining an accurate prediction is not always a simple task and can be modeled as a problem of classification between positive sequences (protein codifiers) and negative sequences (non-codifiers). The problem is highly imbalanced because each molecule of mRNA has a unique translation initiation site and various others that are not initiators. Therefore, this study focuses on the problem from the perspective of balancing classes and we present an undersampling balancing method, M-clus, which is based on clustering. The method also adds features to sequences and improves the performance of the classifier through the inclusion of knowledge obtained by the model, called InAKnow. RESULTS: Through this methodology, the measures of performance used (accuracy, sensitivity, specificity and adjusted accuracy) are greater than 93% for the Mus musculus and Rattus norvegicus organisms, and varied between 72.97% and 97.43% for the other organisms evaluated: Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Nasonia vitripennis. The precision increases significantly by 39% and 22.9% for Mus musculus and Rattus norvegicus, respectively, when the knowledge obtained by the model is included. For the other organisms, the precision increases by between 37.10% and 59.49%. The inclusion of certain features during training, for example, the presence of ATG in the upstream region of the Translation Initiation Site, improves the rate of sensitivity by approximately 7%. Using the M-Clus balancing method generates a significant increase in the rate of sensitivity from 51.39% to 91.55% (Mus musculus) and from 47.45% to 88.09% (Rattus norvegicus). CONCLUSIONS: In order to solve the problem of TIS prediction, the results indicate that the methodology proposed in this work is adequate, particularly when using the concept of acquired knowledge which increased the accuracy in all databases evaluated.


Assuntos
Algoritmos , Iniciação Traducional da Cadeia Peptídica , RNA Mensageiro/química , Animais , Arabidopsis/genética , Sequência de Bases , Caenorhabditis elegans/genética , Humanos , Himenópteros/genética , Camundongos , RNA Mensageiro/metabolismo , Curva ROC , Ratos , Software
16.
Microbiol Res ; 165(4): 312-20, 2010 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-19720513

RESUMO

Corynebacterium pseudotuberculosis is an intracellular pathogen that causes Caseous lymphadenitis (CLA) disease in sheep and goats. The widespread occurrence and the economic importance of this pathogen have prompted investigation of its pathogenesis. We used a genomic library of C. pseudotuberculosis to generate 1440 genomic survey sequences (GSSs); these were analyzed in silico with bioinformatics tools, using public databases for comparative analyses. We employed non-redundant unique sequences as a query for BLAST searches against the genome, the translated genome and the proteome of four other Corynebacterium species that have been completely sequenced. We were able to characterize approximately 8% of the genome of C. pseudotuberculosis, including previously undescribed functional group genes, based on the COG database; the GSSs classification into categories gave 13% information storage and processing, 14% cellular processes and 23% metabolism. We found a close relation between C. pseudotuberculosis and C. diphtheriae conserved-gene synteny in Corynebacteria species.


Assuntos
Corynebacterium pseudotuberculosis/genética , Genes Bacterianos , Genoma Bacteriano , Sequência de Bases , Corynebacterium pseudotuberculosis/classificação , Corynebacterium pseudotuberculosis/patogenicidade , DNA Bacteriano , Dados de Sequência Molecular , Análise de Sequência de DNA
17.
Genet Mol Res ; 5(1): 242-53, 2006 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-16755515

RESUMO

The expressed sequence tag (EST) is an instrument of gene discovery. When available in large numbers, ESTs may be used to estimate gene expression. We analyzed gene expression by EST sampling, using the KOG database, which includes 24,154 proteins from Arabidopsis thaliana (Ath), 17,101 from Caenorhabditis elegans (Cel), 10,517 from Drosophila melanogaster (Dme), and 26,324 from Homo sapiens (Hsa), and 178,538 ESTs for Ath, 215,200 for Cel, 261,404 for Dme, and 1,941,556 for Hsa. BLAST similarity searches were performed to assign KOG annotation to all ESTs. We determined the amount of gene sampling or expression dedicated to each KOG functional category by each model organism. We found that the 25% most-expressed genes are frequently shared among these organisms. The KOG protein classification allowed the EST sampling calculation throughout the glycolysis pathway. We calculated the KOG cluster coverage and inferred that 50 to 80 K ESTs would efficiently cover 80-85% of the KOG database clusters in a transcriptome project. Since KOG is a database biased towards housekeeping genes, this is probably the number of ESTs needed to include the more commonly expressed genes in these organisms. We also examined a still unaddressed question: what is the minimum number of ESTs that should be produced in a transcriptome project?


Assuntos
Proteínas de Arabidopsis/genética , Proteínas de Caenorhabditis elegans/genética , Proteínas de Drosophila/genética , Etiquetas de Sequências Expressas , Expressão Gênica/genética , Animais , Proteínas de Arabidopsis/química , Proteínas de Caenorhabditis elegans/química , Análise por Conglomerados , Bases de Dados Genéticas , Bases de Dados de Proteínas , Proteínas de Drosophila/química , Humanos , Modelos Genéticos , Análise de Sequência de Proteína , Transcrição Gênica
18.
Genet Mol Res ; 3(4): 483-92, 2004 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-15688315

RESUMO

When analyzing sequencing reads, it is important to distinguish between putative correct and wrong bases. An open question is how a PHRED quality value is capable of identifying the miscalled bases and if there is a quality cutoff that allows mapping of most errors. Considering the fact that a low quality value does not necessarily indicate a miscalled position, we decided to investigate if window-based analyses of quality values might better predict errors. There are many reasons to look for a perfect window in DNA sequences, such as when using SAGE technique, looking for BLAST seeding and clustering sequences. Thus, we set out to find a quality cutoff value that would distinguish non-perfect windows from perfect ones. We produced and compared 846 reads of pUC18 with the published pUC consensus, by local alignment. We then generated a database containing all mismatches, insertions and gaps in order to map real perfect windows. An investigation was made to find the potential to predict perfect windows when all bases in the window show quality values over a given cutoff. We conclude that, in window-based applications, a PHRED quality value cutoff of 7 masks most of the errors without masking real correct windows. We suggest that the putative wrong bases be indicated in lower case, increasing the information on the sequence databases without increasing the size the files.


Assuntos
Algoritmos , Bases de Dados Genéticas/normas , Genoma Humano , Controle de Qualidade , Análise de Sequência de DNA/normas , Sequência de Bases , Humanos , Dados de Sequência Molecular , Fases de Leitura Aberta , Alinhamento de Sequência , Análise de Sequência de DNA/métodos , Software
19.
Genet. mol. res. (Online) ; 3(4): 483-492, 2004. tab, graf
Artigo em Inglês | LILACS | ID: lil-410893

RESUMO

When analyzing sequencing reads, it is important to distinguish between putative correct and wrong bases. An open question is how a PHRED quality value is capable of identifying the miscalled bases and if there is a quality cutoff that allows mapping of most errors. Considering the fact that a low quality value does not necessarily indicate a miscalled position, we decided to investigate if window-based analyses of quality values might better predict errors. There are many reasons to look for a perfect window in DNA sequences, such as when using SAGE technique, looking for BLAST seeding and clustering sequences. Thus, we set out to find a quality cutoff value that would distinguish non-perfect windows from perfect ones. We produced and compared 846 reads of pUC18 with the published pUC consensus, by local alignment. We then generated a database containing all mismatches, insertions and gaps in order to map real perfect windows. An investigation was made to find the potential to predict perfect windows when all bases in the window show quality values over a given cutoff. We conclude that, in window-based applications, a PHRED quality value cutoff of 7 masks most of the errors without masking real correct windows. We suggest that the putative wrong bases be indicated in lower case, increasing the information on the sequence databases without increasing the size the files.


Assuntos
Humanos , Algoritmos , Bases de Dados Genéticas/normas , Genoma Humano , Controle de Qualidade , Análise de Sequência de DNA/normas , Sequência de Bases , Dados de Sequência Molecular , Fases de Leitura Aberta , Alinhamento de Sequência , Software , Análise de Sequência de DNA/métodos
20.
Genet Mol Res ; 2(1): 169-77, 2003 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-12917813

RESUMO

Microorganisms with large genomes are commonly the subjects of single-round partial sequencing of cDNA, generating expressed sequence tags (ESTs). Usually there is a great distance between gene discovery by EST projects and submission of amino acid sequences to public databases. We analyzed the relationship between available ESTs and protein sequences and used the sequences available in the secondary database, clusters of orthologous groups (COG), to investigate ESTs from eight microorganisms of medical and/or economic relevance, selecting for candidate ESTs that may be further pursued for protein characterization. The organisms chosen were Paracoccidioides brasiliensis, Dictyostelium discoideum, Fusarium graminearum, Plasmodium yoelii, Magnaporthe grisea, Emericella nidulans, Chlamydomonas reinhardtii and Eimeria tenella, which have more than 10,000 ESTs available in dbEST. A total of 77,114 protein sequences from COG were used, corresponding to 3,201 distinct genes. At least 212 of these were capable of identifying candidate ESTs for further studies (E. tenella). This number was extended to over 700 candidate ESTs (C. reinhardtii, F. graminearum). Remarkably, even the organism that presents the highest number of ESTs corresponding to known proteins, P. yoelii, showed a considerable number of candidate ESTs for protein characterization (477). For some organisms, such as P. brasiliensis, M. grisea and F. graminearum, bioinformatics has allowed for automatic annotation of up to about 20% of the ESTs that did not correspond to proteins already characterized in the organism. In conclusion, 4093 ESTs from these eight organisms that are homologous to COG genes were selected as candidates for protein characterization.


Assuntos
Bases de Dados de Proteínas , Etiquetas de Sequências Expressas , Análise de Sequência de Proteína , Animais , Chlamydomonas reinhardtii/genética , Dictyostelium/genética , Eimeria tenella/genética , Emericella/genética , Fusarium/genética , Genoma , Magnaporthe/genética , Paracoccidioides/genética , Plasmodium yoelii/genética , Proteínas/genética , Homologia de Sequência de Aminoácidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA