Pesquisa | Biblioteca Virtual em Saúde Fiocruz

1.

Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashing.

Abnousi, Armen; Broschat, Shira L; Kalyanaraman, Ananth.

BMC Bioinformatics ; 19(1): 83, 2018 03 05.

Artigo em Inglês | MEDLINE | ID: mdl-29506470

RESUMO

BACKGROUND: Clustering of protein sequences is of key importance in predicting the structure and function of newly sequenced proteins and is also of use for their annotation. With the advent of multiple high-throughput sequencing technologies, new protein sequences are becoming available at an extraordinary rate. The rapid growth rate has impeded deployment of existing protein clustering/annotation tools which depend largely on pairwise sequence alignment. RESULTS: In this paper, we propose an alignment-free clustering approach, coreClust, for annotating protein sequences using detected conserved regions. The proposed algorithm uses Min-Wise Independent Hashing for identifying similar conserved regions. Min-Wise Independent Hashing works by generating a (w,c)-sketch for each document and comparing these sketches. Our algorithm fits well within the MapReduce framework, permitting scalability. We show that coreClust generates results comparable to existing known methods. In particular, we show that the clusters generated by our algorithm capture the subfamilies of the Pfam domain families for which the sequences in a cluster have a similar domain architecture. We show that for a data set of 90,000 sequences (about 250,000 domain regions), the clusters generated by our algorithm give a 75% average weighted F1 score, our accuracy metric, when compared to the clusters generated by a semi-exhaustive pairwise alignment algorithm. CONCLUSIONS: The new clustering algorithm can be used to generate meaningful clusters of conserved regions. It is a scalable method that when paired with our prior work, NADDA for detecting conserved regions, provides a complete end-to-end pipeline for annotating protein sequences.

Assuntos

Algoritmos , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Análise por Conglomerados , Filogenia , Domínios Proteicos , Rickettsia/classificação

2.

Comparative genomics reveals multiple pathways to mutualism for tick-borne pathogens.

Lockwood, Svetlana; Brayton, Kelly A; Broschat, Shira L.

BMC Genomics ; 17: 481, 2016 07 02.

Artigo em Inglês | MEDLINE | ID: mdl-27368698

RESUMO

BACKGROUND: Multiple important human and livestock pathogens employ ticks as their primary host vectors. It is not currently known whether this means of infecting a host arose once or many times during evolution. RESULTS: In order to address this question, we conducted a comparative genomics analysis on a set of bacterial pathogens from seven genera - Borrelia, Rickettsia, Anaplasma, Ehrlichia, Francisella, Coxiella, and Bartonella, including species from three different host vectors - ticks, lice, and fleas. The final set of 102 genomes used in the study encoded a total of 120,046 protein sequences. We found that no genes or metabolic pathways were present in all tick-borne bacteria. However, we found some genes and pathways were present in subsets of tick-transmitted organisms while absent from bacteria transmitted by lice or fleas. CONCLUSION: Our analysis suggests that the ability of pathogens to be transmitted by ticks arose multiple times over the course of evolution. To our knowledge, this is the most comprehensive study of tick transmissibility to date.

Assuntos

Metagenoma , Metagenômica , Doenças Transmitidas por Carrapatos/microbiologia , Animais , Bactérias/classificação , Bactérias/genética , Bactérias/metabolismo , Análise por Conglomerados , Biologia Computacional/métodos , Humanos , Redes e Vias Metabólicas , Metagenômica/métodos , Ftirápteros/microbiologia , Filogenia , Sifonápteros/microbiologia , Doenças Transmitidas por Carrapatos/transmissão

3.

Primary Structural Variation in Anaplasma marginale Msp2 Efficiently Generates Immune Escape Variants.

Graça, Telmo; Paradiso, Lydia; Broschat, Shira L; Noh, Susan M; Palmer, Guy H.

Infect Immun ; 83(11): 4178-84, 2015 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-26259814

RESUMO

Antigenic variation allows microbial pathogens to evade immune clearance and establish persistent infection. Anaplasma marginale utilizes gene conversion of a repertoire of silent msp2 alleles into a single active expression site to encode unique Msp2 variants. As the genomic complement of msp2 alleles alone is insufficient to generate the number of variants required for persistence, A. marginale uses segmental gene conversion, in which oligonucleotide segments from multiple alleles are recombined into the expression site to generate a novel msp2 mosaic not represented elsewhere in the genome. Whether these segmental changes are sufficient to evade a broad antibody response is unknown. We addressed this question by identifying Msp2 variants that differed in primary structure within the immunogenic hypervariable region microdomains and tested whether they represented true antigenic variants. The minimal primary structural difference between variants was a single amino acid resulting from a codon insertion, and overall, the amino acid identity among paired microdomains ranged from 18 to 92%. Collectively, 89% of the expressed structural variants were also antigenic variants across all biological replicates, independent of a specific host major histocompatibility complex haplotype. Biological relevance is supported by the following: (i) all structural variants were expressed during infection of a natural host, (ii) the structural variation observed in the microdomains corresponded to the mean length of variants generated by segmental gene conversion, and (iii) antigenic variants were identified using a broad antibody response that developed during infection of a natural host. The findings demonstrate that segmental gene conversion efficiently generates Msp2 antigenic variants.

Assuntos

Anaplasma marginale/imunologia , Anaplasmose/imunologia , Variação Antigênica , Antígenos de Bactérias/química , Antígenos de Bactérias/imunologia , Proteínas da Membrana Bacteriana Externa/química , Proteínas da Membrana Bacteriana Externa/imunologia , Sequência de Aminoácidos , Anaplasma marginale/química , Anaplasma marginale/genética , Anaplasmose/microbiologia , Anticorpos Antibacterianos/imunologia , Antígenos de Bactérias/genética , Proteínas da Membrana Bacteriana Externa/genética , Humanos , Evasão da Resposta Imune , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Alinhamento de Sequência

4.

Expansion of variant diversity associated with a high prevalence of pathogen strain superinfection under conditions of natural transmission.

Ueti, Massaro W; Tan, Yunbing; Broschat, Shira L; Castañeda Ortiz, Elizabeth J; Camacho-Nuez, Minerva; Mosqueda, Juan J; Scoles, Glen A; Grimes, Matthew; Brayton, Kelly A; Palmer, Guy H.

Infect Immun ; 80(7): 2354-60, 2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-22585962

RESUMO

Superinfection occurs when a second, genetically distinct pathogen strain infects a host that has already mounted an immune response to a primary strain. For antigenically variant pathogens, the primary strain itself expresses a broad diversity of variants over time. Thus, successful superinfection would require that the secondary strain express a unique set of variants. We tested this hypothesis under conditions of natural transmission in both temperate and tropical regions where, respectively, single-strain infections and strain superinfections of the tick-borne pathogen Anaplasma marginale predominate. Our conclusion that strain superinfection is associated with a significant increase in variant diversity is supported by progressive analysis of variant composition: (i) animals with naturally acquired superinfection had a statistically significantly greater number of unique variant sequences than animals either experimentally infected with single strains or infected with a single strain naturally, (ii) the greater number of unique sequences reflected a statistically significant increase in primary structural diversity in the superinfected animals, and (iii) the increase in primary structural diversity reflected increased combinations of the newly identified hypervariable microdomains. The role of population immunity in establishing temporal and spatial patterns of infection and disease has been well established. The results of the present study, which examined strain structure under conditions of natural transmission and population immunity, support that high levels of endemicity also drive pathogen divergence toward greater strain diversity.

Assuntos

Anaplasma marginale/imunologia , Anaplasmose/epidemiologia , Anaplasmose/microbiologia , Variação Antigênica/imunologia , Variação Genética , Superinfecção , Anaplasma marginale/genética , Anaplasmose/imunologia , Animais , Variação Antigênica/genética , Antígenos de Bactérias/genética , Antígenos de Bactérias/imunologia , DNA Bacteriano/química , DNA Bacteriano/genética , Humanos , Dados de Sequência Molecular , Prevalência , Análise de Sequência de DNA

5.

Genetic relationships among 527 Gram-negative bacterial plasmids.

Zhou, Yunyun; Call, Douglas R; Broschat, Shira L.

Plasmid ; 68(2): 133-41, 2012 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-22587825

RESUMO

Plasmids are mosaic in composition with a maintenance "backbone" as well as "accessory" genes obtained via horizontal gene transfer. This horizontal gene transfer complicates the study of their genetic relationships. We describe a method for relating a large number of Gram-negative (GN) bacterial plasmids based on their genetic sequences. Complete coding gene sequences of 527 GN bacterial plasmids were obtained from NCBI. Initial classification of their genetic relationships was accomplished using a computational approach analogous to hybridization of "mixed-genome microarrays." Because of this similarity, the phrase "virtual hybridization" is used to describe this approach. Protein sequences generated from the gene sequences were randomly chosen to serve as "probes" for the virtual arrays, and virtual hybridization for each GN plasmid was achieved using BLASTp. Each resulting intensity matrix was used to generate a distance matrix from which an initial tree was constructed. Relationships were refined for several clusters by identifying conserved proteins within a cluster. Multiple-sequence alignment was applied to the concatenated conserved proteins, and maximum likelihood was used to generate relationships from the results of the alignment. While it is not possible to prove that the genetic relationships among the 527 GN bacterial plasmids obtained in this study are correct, replication of identical results produced in a separate study for a small group of IncA/C plasmids provides evidence that the approach used can correctly predict genetic relationships. In addition, results obtained for clusters of Borrelia plasmids are consistent with the expected exclusivity for plasmids from this genus. Finally, the 527-plasmid tree was used to study the distribution of four common antibiotic resistance genes.

Assuntos

Bactérias Gram-Negativas/classificação , Bactérias Gram-Negativas/genética , Plasmídeos/genética , Filogenia

6.

Sequence determinants of human-cell entry identified in ACE2-independent bat sarbecoviruses: A combined laboratory and computational network science approach.

Khaledian, Ehdieh; Ulusan, Sinem; Erickson, Jeffery; Fawcett, Stephen; Letko, Michael C; Broschat, Shira L.

EBioMedicine ; 79: 103990, 2022 May.

Artigo em Inglês | MEDLINE | ID: mdl-35405384

RESUMO

BACKGROUND: The sarbecovirus subgenus of betacoronaviruses is widely distributed throughout bats and other mammals globally and includes human pathogens, SARS-CoV and SARS-CoV-2. The most studied sarbecoviruses use the host protein, ACE2, to infect cells. Curiously, the majority of sarbecoviruses identified to date do not use ACE2 and cannot readily acquire ACE2 binding through point mutations. We previously screened a broad panel of sarbecovirus spikes for cell entry and observed bat-derived viruses that could infect human cells, independent of ACE2. Here we further investigate the sequence determinants of cell entry for ACE2-independent bat sarbecoviruses. METHODS: We employed a network science-based approach to visualize sequence and entry phenotype similarities across the diversity of sarbecovirus spike protein sequences. We then verified these computational results and mapped determinants of viral entry into human cells using recombinant chimeric spike proteins within an established viral pseudotype assay. FINDINGS: We show ACE2-independent viruses that can infect human and bat cells in culture have a similar putative receptor binding motif, which can impart human cell entry into other bat sarbecovirus spikes that cannot otherwise infect human cells. These sequence determinants of human cell entry map to a surface-exposed protrusion from the predicted bat sarbecovirus spike receptor binding domain structure. INTERPRETATION: Our findings provide further evidence of a group of bat-derived sarbecoviruses with zoonotic potential and demonstrate the utility in applying network science to phenotypic mapping and prediction. FUNDING: This work was supported by Washington State University and the Paul G. Allen School for Global Health.

Assuntos

COVID-19 , Quirópteros , Coronavírus Relacionado à Síndrome Respiratória Aguda Grave , Enzima de Conversão de Angiotensina 2/genética , Animais , Humanos , SARS-CoV-2 , Glicoproteína da Espícula de Coronavírus/metabolismo , Internalização do Vírus

7.

Genotypic-phenotypic discrepancies between antibiotic resistance characteristics of Escherichia coli isolates from calves in management settings with high and low antibiotic use.

Davis, Margaret A; Besser, Thomas E; Orfe, Lisa H; Baker, Katherine N K; Lanier, Amelia S; Broschat, Shira L; New, Daniel; Call, Douglas R.

Appl Environ Microbiol ; 77(10): 3293-9, 2011 May.

Artigo em Inglês | MEDLINE | ID: mdl-21421795

RESUMO

We hypothesized that bacterial populations growing in the absence of antibiotics will accumulate more resistance gene mutations than bacterial populations growing in the presence of antibiotics. If this is so, the prevalence of dysfunctional resistance genes (resistance pseudogenes) could provide a measure of the level of antibiotic exposure present in a given environment. As a proof-of-concept test, we assayed field strains of Escherichia coli for their resistance genotypes using a resistance gene microarray and further characterized isolates that had resistance phenotype-genotype discrepancies. We found a small but significant association between the prevalence of isolates with resistance pseudogenes and the lower antibiotic use environment of a beef cow-calf operation versus a higher antibiotic use dairy calf ranch (Fisher's exact test, P = 0.044). Other significant findings include a very strong association between the dairy calf ranch isolates and phenotypes unexplained by well-known resistance genes (Fisher's exact test, P < 0.0001). Two novel resistance genes were discovered in E. coli isolates from the dairy calf ranch, one associated with resistance to aminoglycosides and one associated with resistance to trimethoprim. In addition, isolates resistant to expanded-spectrum cephalosporins but negative for bla(CMY-2) had mutations in the promoter regions of the chromosomal E. coli ampC gene consistent with reported overexpression of native AmpC beta-lactamase. Similar mutations in hospital E. coli isolates have been reported worldwide. Prevalence or rates of E. coli ampC promoter mutations may be used as a marker for high expanded-spectrum cephalosporin use environments.

Assuntos

Antibacterianos/farmacologia , Doenças dos Bovinos/tratamento farmacológico , Doenças dos Bovinos/microbiologia , Farmacorresistência Bacteriana , Infecções por Escherichia coli/veterinária , Escherichia coli/efeitos dos fármacos , Animais , Antibacterianos/uso terapêutico , Bovinos , DNA Bacteriano/química , DNA Bacteriano/genética , Escherichia coli/isolamento & purificação , Infecções por Escherichia coli/tratamento farmacológico , Infecções por Escherichia coli/microbiologia , Proteínas de Escherichia coli/genética , Genes Bacterianos , Genótipo , Dados de Sequência Molecular , Mutação , Análise de Sequência de DNA

8.

PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks.

Tao, Jin; Brayton, Kelly A; Broschat, Shira L.

Front Bioinform ; 1: 749008, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-36303767

RESUMO

Advances in genome sequencing have accelerated the growth of sequenced genomes but at a cost in the quality of genome annotation. At the same time, computational analysis is widely used for protein annotation, but a dearth of experimental verification has contributed to inaccurate annotation as well as to annotation error propagation. Thus, a tool to help life scientists with accurate protein annotation would be useful. In this work we describe a website we have developed, the Protein Annotation Surveillance Site (PASS), which provides such a tool. This website consists of three major components: a database of homologous clusters of more than eight million protein sequences deduced from the representative genomes of bacteria, archaea, eukarya, and viruses, together with sequence information; a machine-learning software tool which periodically queries the UniprotKB database to determine whether protein function has been experimentally verified; and a query-able webpage where the FASTA headers of sequences from the cluster best matching an input sequence are returned. The user can choose from these sequences to create a sequence similarity network to assist in annotation or else use their expert knowledge to choose an annotation from the cluster sequences. Illustrations demonstrating use of this website are presented.

9.

blaCMY-2-positive IncA/C plasmids from Escherichia coli and Salmonella enterica are a distinct component of a larger lineage of plasmids.

Call, Douglas R; Singer, Randall S; Meng, Da; Broschat, Shira L; Orfe, Lisa H; Anderson, Janet M; Herndon, David R; Kappmeyer, Lowell S; Daniels, Joshua B; Besser, Thomas E.

Antimicrob Agents Chemother ; 54(2): 590-6, 2010 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-19949054

RESUMO

Large multidrug resistance plasmids of the A/C incompatibility complex (IncA/C) have been found in a diverse group of Gram-negative commensal and pathogenic bacteria. We present three completed sequences from IncA/C plasmids that originated from Escherichia coli (cattle) and Salmonella enterica serovar Newport (human) and that carry the cephamycinase gene blaCMY-2. These large plasmids (148 to 166 kbp) share extensive sequence identity and synteny. The most divergent plasmid, peH4H, has lost several conjugation-related genes and has gained a kanamycin resistance region. Two of the plasmids (pAM04528 and peH4H) harbor two copies of blaCMY-2, while the third plasmid (pAR060302) harbors a single copy of the gene. The majority of single-nucleotide polymorphisms comprise nonsynonymous mutations in floR. A comparative analysis of these plasmids with five other published IncA/C plasmids showed that the blaCMY-2 plasmids from E. coli and S. enterica are genetically distinct from those originating from Yersinia pestis and Photobacterium damselae and distal to one originating from Yersinia ruckeri. While the overall similarity of these plasmids supports the likelihood of recent movements among E. coli and S. enterica hosts, their greater divergence from Y. pestis or Y. ruckeri suggests less recent plasmid transfer among these pathogen groups.

Assuntos

Escherichia coli/genética , Plasmídeos/genética , Salmonella enterica/genética , beta-Lactamases/genética , Farmacorresistência Bacteriana Múltipla/genética , Photobacterium/genética , Filogenia , Plasmídeos/classificação , Polimorfismo de Nucleotídeo Único/genética , Yersinia pestis/genética

10.

A Systematic Approach to Bacterial Phylogeny Using Order Level Sampling and Identification of HGT Using Network Science.

Khaledian, Ehdieh; Brayton, Kelly A; Broschat, Shira L.

Microorganisms ; 8(2)2020 Feb 24.

Artigo em Inglês | MEDLINE | ID: mdl-32102454

RESUMO

Reconstructing and visualizing phylogenetic relationships among living organisms is a fundamental challenge because not all organisms share the same genes. As a result, the first phylogenetic visualizations employed a single gene, e.g., rRNA genes, sufficiently conserved to be present in all organisms but divergent enough to provide discrimination between groups. As more genome data became available, researchers began concatenating different combinations of genes or proteins to construct phylogenetic trees believed to be more robust because they incorporated more information. However, the genes or proteins chosen were based on ad hoc approaches. The large number of complete genome sequences available today allows the use of whole genomes to analyze relationships among organisms rather than using an ad hoc set of genes. We present a systematic approach for constructing a phylogenetic tree based on simultaneously clustering the complete proteomes of 360 bacterial species. From the homologous clusters, we identify 49 protein sequences shared by 99% of the organisms to build a tree. Of the 49 sequences, 47 have homologous sequences in both archaea and eukarya. The clusters are also used to create a network from which bacterial species with horizontally-transferred genes from other phyla are identified.

11.

PARGT: a software tool for predicting antimicrobial resistance in bacteria.

Chowdhury, Abu Sayed; Call, Douglas R; Broschat, Shira L.

Sci Rep ; 10(1): 11033, 2020 07 03.

Artigo em Inglês | MEDLINE | ID: mdl-32620856

RESUMO

With the ever-increasing availability of whole-genome sequences, machine-learning approaches can be used as an alternative to traditional alignment-based methods for identifying new antimicrobial-resistance genes. Such approaches are especially helpful when pathogens cannot be cultured in the lab. In previous work, we proposed a game-theory-based feature evaluation algorithm. When using the protein characteristics identified by this algorithm, called 'features' in machine learning, our model accurately identified antimicrobial resistance (AMR) genes in Gram-negative bacteria. Here we extend our study to Gram-positive bacteria showing that coupling game-theory-identified features with machine learning achieved classification accuracies between 87% and 90% for genes encoding resistance to the antibiotics bacitracin and vancomycin. Importantly, we present a standalone software tool that implements the game-theory algorithm and machine-learning model used in these studies.

Assuntos

Antibacterianos/farmacologia , Bactérias/genética , Biologia Computacional/métodos , Farmacorresistência Bacteriana , Bacitracina/farmacologia , Bactérias/efeitos dos fármacos , Teoria dos Jogos , Aprendizado de Máquina , Testes de Sensibilidade Microbiana , Software , Vancomicina/farmacologia , Sequenciamento Completo do Genoma

12.

Publisher Correction: Antimicrobial Resistance Prediction for Gram-Negative Bacteria via Game Theory-Based Feature Evaluation.

Chowdhury, Abu Sayed; Call, Douglas R; Broschat, Shira L.

Sci Rep ; 10(1): 1846, 2020 Jan 30.

Artigo em Inglês | MEDLINE | ID: mdl-31996773

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

13.

Prediction of T4SS Effector Proteins for Anaplasma phagocytophilum Using OPT4e, A New Software Tool.

Esna Ashari, Zhila; Brayton, Kelly A; Broschat, Shira L.

Front Microbiol ; 10: 1391, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31293540

RESUMO

Type IV secretion systems (T4SS) are used by a number of bacterial pathogens to attack the host cell. The complex protein structure of the T4SS is used to directly translocate effector proteins into host cells, often causing fatal diseases in humans and animals. Identification of effector proteins is the first step in understanding how they function to cause virulence and pathogenicity. Accurate prediction of effector proteins via a machine learning approach can assist in the process of their identification. The main goal of this study is to predict a set of candidate effectors for the tick-borne pathogen Anaplasma phagocytophilum, the causative agent of anaplasmosis in humans. To our knowledge, we present the first computational study for effector prediction with a focus on A. phagocytophilum. In a previous study, we systematically selected a set of optimal features from more than 1,000 possible protein characteristics for predicting T4SS effector candidates. This was followed by a study of the features using the proteome of Legionella pneumophila strain Philadelphia deduced from its complete genome. In this manuscript we introduce the OPT4e software package for Optimal-features Predictor for T4SS Effector proteins. An earlier version of OPT4e was verified using cross-validation tests, accuracy tests, and comparison with previous results for L. pneumophila. We use OPT4e to predict candidate effectors from the proteomes of A. phagocytophilum strains HZ and HGE-1 and predict 48 and 46 candidates, respectively, with 16 and 18 deemed most probable as effectors. These latter include the three known validated effectors for A. phagocytophilum.

14.

Antimicrobial Resistance Prediction for Gram-Negative Bacteria via Game Theory-Based Feature Evaluation.

Chowdhury, Abu Sayed; Call, Douglas R; Broschat, Shira L.

Sci Rep ; 9(1): 14487, 2019 10 09.

Artigo em Inglês | MEDLINE | ID: mdl-31597945

RESUMO

The increasing prevalence of antimicrobial-resistant bacteria drives the need for advanced methods to identify antimicrobial-resistance (AMR) genes in bacterial pathogens. With the availability of whole genome sequences, best-hit methods can be used to identify AMR genes by differentiating unknown sequences with known AMR sequences in existing online repositories. Nevertheless, these methods may not perform well when identifying resistance genes with sequences having low sequence identity with known sequences. We present a machine learning approach that uses protein sequences, with sequence identity ranging between 10% and 90%, as an alternative to conventional DNA sequence alignment-based approaches to identify putative AMR genes in Gram-negative bacteria. By using game theory to choose which protein characteristics to use in our machine learning model, we can predict AMR protein sequences for Gram-negative bacteria with an accuracy ranging from 93% to 99%. In order to obtain similar classification results, identity thresholds as low as 53% were required when using BLASTp.

Assuntos

Farmacorresistência Bacteriana/genética , Genes Bacterianos , Bactérias Gram-Negativas/efeitos dos fármacos , Bactérias Gram-Negativas/genética , Algoritmos , Sequência de Aminoácidos , Antibacterianos/farmacologia , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Enterobacter/efeitos dos fármacos , Enterobacter/genética , Teoria dos Jogos , Bactérias Gram-Negativas/patogenicidade , Humanos , Aprendizado de Máquina , Pseudomonas/efeitos dos fármacos , Pseudomonas/genética , Máquina de Vetores de Suporte , Vibrio/efeitos dos fármacos , Vibrio/genética , Sequenciamento Completo do Genoma

15.

Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila.

Esna Ashari, Zhila; Brayton, Kelly A; Broschat, Shira L.

PLoS One ; 14(1): e0202312, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-30682021

RESUMO

Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This study focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires' disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 472 effector proteins that are deemed highly probable to be effectors and include 94% of known effectors. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors.

Assuntos

Proteínas de Bactérias/genética , Legionella pneumophila/genética , Modelos Genéticos , Máquina de Vetores de Suporte , Sistemas de Secreção Tipo IV/genética , Fatores de Virulência/genética , Humanos , Doença dos Legionários/genética

16.

Whole Proteome Clustering of 2,307 Proteobacterial Genomes Reveals Conserved Proteins and Significant Annotation Issues.

Lockwood, Svetlana; Brayton, Kelly A; Daily, Jeff A; Broschat, Shira L.

Front Microbiol ; 10: 383, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-30873148

RESUMO

We clustered 8.76 M protein sequences deduced from 2,307 completely sequenced Proteobacterial genomes resulting in 707,311 clusters of one or more sequences of which 224,442 ranged in size from 2 to 2,894 sequences. To our knowledge this is the first study of this scale. We were surprised to find that no single cluster contained a representative sequence from all the organisms in the study. Given the minimal genome concept, we expected to find a shared set of proteins. To determine why the clusters did not have universal representation we chose four essential proteins, the chaperonin GroEL, DNA dependent RNA polymerase subunits beta and beta' (RpoB/RpoB'), and DNA polymerase I (PolA), representing fundamental cellular functions, and examined their cluster distribution. We found these proteins to be remarkably conserved with certain caveats. Although the groEL gene was universally conserved in all the organisms in the study, the protein was not represented in all the deduced proteomes. The genes for RpoB and RpoB' were missing from two genomes and merged in 88, and the sequences were sufficiently divergent that they formed separate clusters for 18 RpoB proteins (seven clusters) and 14 RpoB' proteins (three clusters). For PolA, 52 organisms lacked an identifiable sequence, and seven sequences were sufficiently divergent that they formed five separate clusters. Interestingly, organisms lacking an identifiable PolA and those with divergent RpoB/RpoB' were predominantly endosymbionts. Furthermore, we present a range of examples of annotation issues that caused the deduced proteins to be incorrectly represented in the proteome. These annotation issues made our task of determining protein conservation more difficult than expected and also represent a significant obstacle for high-throughput analyses.

17.

Editorial: Machine learning approaches to antimicrobials: discovery and resistance.

Broschat, Shira L; Siu, Shirley W I; de la Fuente-Nunez, Cesar.

Front Bioinform ; 4: 1458237, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39184338

18.

Dynamics of repeat-associated plasticity in the aaap gene family in Anaplasma marginale.

Fallquist, Heather M; Tao, Jin; Cheng, Xiaoya; Pierlé, Sebastian Aguilar; Broschat, Shira L; Brayton, Kelly A.

Gene ; 721S: 100010, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-34530992

RESUMO

Anaplasmosis, the most prevalent tick-transmitted disease of cattle, is caused by the rickettsial intracellular parasite Anaplasma marginale. The pathogen replicates within a parasitophorous vacuole formed from the invagination of the erythrocyte membrane. Several strains of A. marginale form "tails" or "appendages" which are attached to, and extend out from, the cytoplasmic side of the parasitophorous vacuole. Genomic analysis of the parasite antigen distributed along the appendage led to the discovery of the aaap (Anaplasma appendage associated protein) gene family located within a highly plastic region in the genome. The aaap gene family consists of aaap and several alps (for aaap-like proteins), depending on the strain. These genes/proteins are characterized by repeat sequences. To investigate locus plasticity, different versions of the locus were cloned from the same strain as well as from different strains, sequenced and aligned to identify changes. Our findings show that repeat sequences both within and between genes facilitated rearrangement events within the locus. Structural variation of the locus in the St. Maries strain was further investigated during infection of different cellular environments, i.e., bovine erythrocytes and tick cells, with a reduction in subpopulations of the aaap locus within the tick as compared to erythrocytes. Interestingly, subpopulations bearing alternative locus structures began to arise again when the pathogen was transferred from the tick environment into a naïve calf. Additionally, the Aaap protein expression profile between blood and tick samples showed a regulatory shift, indicating a host-specific response. Alignment of the protein sequences from different species of Anaplasma reveals six similar repeating motifs that appear to be unique to a few species of Anaplasma. The role the aaap locus may play in the pathogenesis of the bovine host or in tick infection/transmission remains unknown; however, the changes in aaap locus subpopulations, locus structure, and protein expression indicate that these genes have a role in strain diversification.

19.

Dynamics of repeat-associated plasticity in the aaap gene family in Anaplasma marginale.

Fallquist, Heather M; Tao, Jin; Cheng, Xiaoya; Pierlé, Sebastian Aguilar; Broschat, Shira L; Brayton, Kelly A.

Gene X ; 22019 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-32099970

RESUMO

Anaplasmosis, the most prevalent tick-transmitted disease of cattle, is caused by the rickettsial intracellular parasite Anaplasma marginale. The pathogen replicates within a parasitophorous vacuole formed from the invagination of the erythrocyte membrane. Several strains of A. marginale form "tails" or "appendages" which are attached to, and extend out from, the cytoplasmic side of the parasitophorous vacuole. Genomic analysis of the parasite antigen distributed along the appendage led to the discovery of the aaap (Anaplasma appendage associated protein) gene family located within a highly plastic region in the genome. The aaap gene family consists of aaap and several alps (for aaap-like proteins), depending on the strain. These genes/proteins are characterized by repeat sequences. To investigate locus plasticity, different versions of the locus were cloned from the same strain as well as from different strains, sequenced and aligned to identify changes. Our findings show that repeat sequences both within and between genes facilitated rearrangement events within the locus. Structural variation of the locus in the St. Maries strain was further investigated during infection of different cellular environments, i.e., bovine erythrocytes and tick cells, with a reduction in subpopulations of the aaap locus within the tick as compared to erythrocytes. Interestingly, subpopulations bearing alternative locus structures began to arise again when the pathogen was transferred from the tick environment into a naïve calf. Additionally, the Aaap protein expression profile between blood and tick samples showed a regulatory shift, indicating a host-specific response. Alignment of the protein sequences from different species of Anaplasma reveals six similar repeating motifs that appear to be unique to a few species of Anaplasma. The role the aaap locus may play in the pathogenesis of the bovine host or in tick infection/transmission remains unknown; however, the changes in aaap locus subpopulations, locus structure, and protein expression indicate that these genes have a role in strain diversification.

20.

A Java-based tool for the design of classification microarrays.

Meng, Da; Broschat, Shira L; Call, Douglas R.

BMC Bioinformatics ; 9: 328, 2008 Aug 04.

Artigo em Inglês | MEDLINE | ID: mdl-18680597

RESUMO

BACKGROUND: Classification microarrays are used for purposes such as identifying strains of bacteria and determining genetic relationships to understand the epidemiology of an infectious disease. For these cases, mixed microarrays, which are composed of DNA from more than one organism, are more effective than conventional microarrays composed of DNA from a single organism. Selection of probes is a key factor in designing successful mixed microarrays because redundant sequences are inefficient and limited representation of diversity can restrict application of the microarray. We have developed a Java-based software tool, called PLASMID, for use in selecting the minimum set of probe sequences needed to classify different groups of plasmids or bacteria. RESULTS: The software program was successfully applied to several different sets of data. The utility of PLASMID was illustrated using existing mixed-plasmid microarray data as well as data from a virtual mixed-genome microarray constructed from different strains of Streptococcus. Moreover, use of data from expression microarray experiments demonstrated the generality of PLASMID. CONCLUSION: In this paper we describe a new software tool for selecting a set of probes for a classification microarray. While the tool was developed for the design of mixed microarrays-and mixed-plasmid microarrays in particular-it can also be used to design expression arrays. The user can choose from several clustering methods (including hierarchical, non-hierarchical, and a model-based genetic algorithm), several probe ranking methods, and several different display methods. A novel approach is used for probe redundancy reduction, and probe selection is accomplished via stepwise discriminant analysis. Data can be entered in different formats (including Excel and comma-delimited text), and dendrogram, heat map, and scatter plot images can be saved in several different formats (including jpeg and tiff). Weights generated using stepwise discriminant analysis can be stored for analysis of subsequent experimental data. Additionally, PLASMID can be used to construct virtual microarrays with genomes from public databases, which can then be used to identify an optimal set of probes.

Assuntos

Bactérias/classificação , Genes Bacterianos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Plasmídeos/análise , Bactérias/genética , Bactérias/isolamento & purificação , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Sondas de Oligonucleotídeos , Linguagens de Programação

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA