Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
Mais filtros












Base de dados
Intervalo de ano de publicação
1.
Bioinform Adv ; 4(1): vbae089, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38911822

RESUMO

Motivation: Genomic islands (GEIs) are clusters of genes in bacterial genomes that are typically acquired by horizontal gene transfer. GEIs play a crucial role in the evolution of bacteria by rapidly introducing genetic diversity and thus helping them adapt to changing environments. Specifically of interest to human health, many GEIs contain pathogenicity and antimicrobial resistance genes. Detecting GEIs is, therefore, an important problem in biomedical and environmental research. There have been many previous studies for computationally identifying GEIs. Still, most of these studies rely on detecting anomalies in the unannotated nucleotide sequences or on a fixed set of known features on annotated nucleotide sequences. Results: Here, we present TreasureIsland, which uses a new unsupervised representation of DNA sequences to predict GEIs. We developed a high-precision boundary detection method featuring an incremental fine-tuning of GEI borders, and we evaluated the accuracy of this framework using a new comprehensive reference dataset, Benbow. We show that TreasureIsland's accuracy rivals other GEI predictors, enabling efficient and faster identification of GEIs in unannotated bacterial genomes. Availability and implementation: TreasureIsland is available under an MIT license at: https://github.com/FriedbergLab/GenomicIslandPrediction.

2.
bioRxiv ; 2024 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-38559275

RESUMO

Epitope tagging is an invaluable technique enabling the identification, tracking, and purification of proteins in vivo. We developed a tool, EpicTope, to facilitate this method by identifying amino acid positions suitable for epitope insertion. Our method uses a scoring function that considers multiple protein sequence and structural features to determine locations least disruptive to the protein's function. We validated our approach on the zebrafish Smad5 protein, showing that multiple predicted internally tagged Smad5 proteins rescue zebrafish smad5 mutant embryos, while the N- and C-terminal tagged variants do not, also as predicted. We further show that the internally tagged Smad5 proteins are accessible to antibodies in wholemount zebrafish embryo immunohistochemistry and by western blot. Our work demonstrates that EpicTope is an accessible and effective tool for designing epitope tag insertion sites. EpicTope is available under a GPL-3 license from: https://github.com/FriedbergLab/Epictope.

3.
Bioinform Adv ; 4(1): vbae043, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38545087

RESUMO

We present CAFA-evaluator, a powerful Python program designed to evaluate the performance of prediction methods on targets with hierarchical concept dependencies. It generalizes multi-label evaluation to modern ontologies where the prediction targets are drawn from a directed acyclic graph and achieves high efficiency by leveraging matrix computation and topological sorting. The program requirements include a small number of standard Python libraries, making CAFA-evaluator easy to maintain. The code replicates the Critical Assessment of protein Function Annotation (CAFA) benchmarking, which evaluates predictions of the consistent subgraphs in Gene Ontology. Owing to its reliability and accuracy, the organizers have selected CAFA-evaluator as the official CAFA evaluation software. Availability and implementation: https://pypi.org/project/cafaeval.

4.
Nucleic Acids Res ; 51(19): 10162-10175, 2023 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-37739408

RESUMO

Determining the repertoire of a microbe's molecular functions is a central question in microbial biology. Modern techniques achieve this goal by comparing microbial genetic material against reference databases of functionally annotated genes/proteins or known taxonomic markers such as 16S rRNA. Here, we describe a novel approach to exploring bacterial functional repertoires without reference databases. Our Fusion scheme establishes functional relationships between bacteria and assigns organisms to Fusion-taxa that differ from otherwise defined taxonomic clades. Three key findings of our work stand out. First, bacterial functional comparisons outperform marker genes in assigning taxonomic clades. Fusion profiles are also better for this task than other functional annotation schemes. Second, Fusion-taxa are robust to addition of novel organisms and are, arguably, able to capture the environment-driven bacterial diversity. Finally, our alignment-free nucleic acid-based Siamese Neural Network model, created using Fusion functions, enables finding shared functionality of very distant, possibly structurally different, microbial homologs. Our work can thus help annotate functional repertoires of bacterial organisms and further guide our understanding of microbial communities.


Assuntos
Bactérias , Bactérias/citologia , Bactérias/genética , Bases de Dados Factuais , Microbiota , Filogenia , RNA Ribossômico 16S/genética , Fenômenos Fisiológicos Bacterianos
5.
PLoS One ; 18(8): e0290473, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37616210

RESUMO

Understanding the microbial genomic contributors to antimicrobial resistance (AMR) is essential for early detection of emerging AMR infections, a pressing global health threat in human and veterinary medicine. Here we used whole genome sequencing and antibiotic susceptibility test data from 980 disease causing Escherichia coli isolated from companion and farm animals to model AMR genotypes and phenotypes for 24 antibiotics. We determined the strength of genotype-to-phenotype relationships for 197 AMR genes with elastic net logistic regression. Model predictors were designed to evaluate different potential modes of AMR genotype translation into resistance phenotypes. Our results show a model that considers the presence of individual AMR genes and total number of AMR genes present from a set of genes known to confer resistance was able to accurately predict isolate resistance on average (mean F1 score = 98.0%, SD = 2.3%, mean accuracy = 98.2%, SD = 2.7%). However, fitted models sometimes varied for antibiotics in the same class and for the same antibiotic across animal hosts, suggesting heterogeneity in the genetic determinants of AMR resistance. We conclude that an interpretable AMR prediction model can be used to accurately predict resistance phenotypes across multiple host species and reveal testable hypotheses about how the mechanism of resistance may vary across antibiotics within the same class and across animal hosts for the same antibiotic.


Assuntos
Antibacterianos , Gado , Animais , Humanos , Antibacterianos/farmacologia , Animais de Estimação , Farmacorresistência Bacteriana/genética , Escherichia coli/genética
6.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36688705

RESUMO

MOTIVATION: Advances in sequencing technologies have led to a surge in genomic data, although the functions of many gene products coded by these genes remain unknown. While in-depth, targeted experiments that determine the functions of these gene products are crucial and routinely performed, they fail to keep up with the inflow of novel genomic data. In an attempt to address this gap, high-throughput experiments are being conducted in which a large number of genes are investigated in a single study. The annotations generated as a result of these experiments are generally biased towards a small subset of less informative Gene Ontology (GO) terms. Identifying and removing biases from protein function annotation databases is important since biases impact our understanding of protein function by providing a poor picture of the annotation landscape. Additionally, as machine learning methods for predicting protein function are becoming increasingly prevalent, it is essential that they are trained on unbiased datasets. Therefore, it is not only crucial to be aware of biases, but also to judiciously remove them from annotation datasets. RESULTS: We introduce GOThresher, a Python tool that identifies and removes biases in function annotations from protein function annotation databases. AVAILABILITY AND IMPLEMENTATION: GOThresher is written in Python and released via PyPI https://pypi.org/project/gothresher/ and on the Bioconda Anaconda channel https://anaconda.org/bioconda/gothresher. The source code is hosted on GitHub https://github.com/FriedbergLab/GOThresher and distributed under the GPL 3.0 license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Genômica , Biologia Computacional/métodos , Anotação de Sequência Molecular , Software , Proteínas/genética , Proteínas/metabolismo , Bases de Dados de Proteínas
7.
Database (Oxford) ; 20222022 08 12.
Artigo em Inglês | MEDLINE | ID: mdl-35961013

RESUMO

Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.


Assuntos
Genômica , Proteínas , Sequência de Bases , Biologia Computacional , Genoma , Anotação de Sequência Molecular
8.
Bioinformatics ; 38(Suppl 1): i19-i27, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758800

RESUMO

MOTIVATION: Wikipedia is one of the most important channels for the public communication of science and is frequently accessed as an educational resource in computational biology. Joint efforts between the International Society for Computational Biology (ISCB) and the Computational Biology taskforce of WikiProject Molecular Biology (a group of expert Wikipedia editors) have considerably improved computational biology representation on Wikipedia in recent years. However, there is still an urgent need for further improvement in quality, especially when compared to related scientific fields such as genetics and medicine. Facilitating involvement of members from ISCB Communities of Special Interest (COSIs) would improve a vital open education resource in computational biology, additionally allowing COSIs to provide a quality educational resource highly specific to their subfield. RESULTS: We generate a list of around 1500 English Wikipedia articles relating to computational biology and describe the development of a binary COSI-Article matrix, linking COSIs to relevant articles and thereby defining domain-specific open educational resources. Our analysis of the COSI-Article matrix data provides a quantitative assessment of computational biology representation on Wikipedia against other fields and at a COSI-specific level. Furthermore, we conducted similarity analysis and subsequent clustering of COSI-Article data to provide insight into potential relationships between COSIs. Finally, based on our analysis, we suggest courses of action to improve the quality of computational biology representation on Wikipedia.


Assuntos
Biologia Computacional , Análise por Conglomerados
9.
PLoS One ; 17(4): e0266005, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35381031

RESUMO

The gastrointestinal microbiota begins to be acquired at birth and continually matures through early adolescence. Despite the relevance for gut health, few studies have evaluated the impact of pathobiont colonization of neonates on the severity of colitis later in life. LF82 is an adherent invasive E. coli strain associated with ileal Crohn's disease. The aim of this study was to evaluate the severity of dextran sodium sulfate (DSS)-induced colitis in mice following E. coli LF82 colonization. Gnotobiotic mice harboring the altered Schaedler flora (ASF) were used as the model. While E. coli LF82 is neither adherent nor invasive, it was been demonstrated that adult ASF mice colonized with E. coli LF82 develop more severe DSS-induced colitis compared to control ASF mice treated with DSS. Therefore, we hypothesized that E. coli LF82 colonization of neonatal ASF mice would reduce the severity of DSS-induced inflammation compared to adult ASF mice colonized with E. coli LF82. To test this hypothesis, adult ASF mice were colonized with E. coli LF82 and bred to produce offspring (LF82N) that were vertically colonized with LF82. LF82N and adult-colonized (LF82A) mice were given 2.0% DSS in drinking water for seven days to trigger colitis. More severe inflammatory lesions were observed in the LF82N + DSS mice when compared to LF82A + DSS mice, and were characterized as transmural in most of the LF82N + DSS mice. Colitis was accompanied by secretion of proinflammatory cytokines (IFNγ, IL-17) and specific mRNA transcripts within the colonic mucosa. Using 16S rRNA gene amplicon sequencing, LF82 colonization did not induce significant changes in the ASF community; however, minimal changes in spatial redistribution by fluorescent in situ hybridization were observed. These results suggest that the age at which mice were colonized with E. coli LF82 pathobiont differentially impacted severity of subsequent colitic events.


Assuntos
Colite , Escherichia coli , Animais , Animais Recém-Nascidos , Colite/induzido quimicamente , Colite/patologia , Sulfato de Dextrana/toxicidade , Hibridização in Situ Fluorescente , Mucosa Intestinal/patologia , Camundongos , RNA Ribossômico 16S
10.
Bioinform Adv ; 2(1): vbac057, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36699361

RESUMO

Motivation: Experimental biologists, biocurators, and computational biologists all play a role in characterizing a protein's function. The discovery of protein function in the laboratory by experimental scientists is the foundation of our knowledge about proteins. Experimental findings are compiled in knowledgebases by biocurators to provide standardized, readily accessible, and computationally amenable information. Computational biologists train their methods using these data to predict protein function and guide subsequent experiments. To understand the state of affairs in this ecosystem, centered here around protein function prediction, we surveyed scientists from these three constituent communities. Results: We show that the three communities have common but also idiosyncratic perspectives on the field. Most strikingly, experimentalists rarely use state-of-the-art prediction software, but when presented with predictions, report many to be surprising and useful. Ontologies appear to be highly valued by biocurators, less so by experimentalists and computational biologists, yet controlled vocabularies bridge the communities and simplify the prediction task. Additionally, many software tools are not readily accessible and the predictions presented to the users can be broad and uninformative. We conclude that to meet both the social and technical challenges in the field, a more productive and meaningful interaction between members of the core communities is necessary. Availability and implementation: Data cannot be shared for ethical/privacy reasons. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

11.
PLoS Comput Biol ; 17(10): e1009463, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34710081

RESUMO

Experimental data about gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5,000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a 10-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills.


Assuntos
Crowdsourcing/métodos , Ontologia Genética , Anotação de Sequência Molecular/métodos , Biologia Computacional , Bases de Dados Genéticas , Humanos , Proteínas/genética , Proteínas/fisiologia
12.
Pac Symp Biocomput ; 26: 341-345, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33691031

RESUMO

As rich biomedical data streams are accumulating across people and time, they provide a powerful opportunity to address limitations in our existing scientific knowledge and to overcome operational challenges in healthcare and life sciences. Yet the relative weighting of insights vs. methodologies in our current research ecosystem tends to skew the computational community away from algorithm evaluation and operationalization, resulting in a well-reported trend towards the proliferation of scientific outcomes of unknown reliability. Algorithm selection and use is hindered by several problems that persist across our field. One is the impact of the self-assessment bias, which can lead to mis-representations in the accuracy of research results. A second challenge is the impact of data context on algorithm performance. Biology and medicine are dynamic and heterogeneous. Data is collected under varying conditions. For algorithms, this means that performance is not universal - and need to be evaluated across a range of contexts. These issues are increasingly difficult as algorithms are trained and used on data collected in the real-world, outside of the traditional clinical research lab. In these cases, data collection is neither supervised nor well controlled and data access may be limited by privacy or proprietary reasons. Therefore, there is a risk that algorithms will be applied to data that are outside of the scope of the intent of the original training data provided. This workshop will focus on approaches that are emerging across the researcher community to quantify the accuracy of algorithms and the reliability of their outputs.


Assuntos
Biologia Computacional , Ecossistema , Algoritmos , Coleta de Dados , Reprodutibilidade dos Testes
13.
Nucleic Acids Res ; 49(1): 67-78, 2021 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-33305328

RESUMO

Gene-editing experiments commonly elicit the error-prone non-homologous end joining for DNA double-strand break (DSB) repair. Microhomology-mediated end joining (MMEJ) can generate more predictable outcomes for functional genomic and somatic therapeutic applications. We compared three DSB repair prediction algorithms - MENTHU, inDelphi, and Lindel - in identifying MMEJ-repaired, homogeneous genotypes (PreMAs) in an independent dataset of 5,885 distinct Cas9-mediated mouse embryonic stem cell DSB repair events. MENTHU correctly identified 46% of all PreMAs available, a ∼2- and ∼60-fold sensitivity increase compared to inDelphi and Lindel, respectively. In contrast, only Lindel correctly predicted predominant single-base insertions. We report the new algorithm MENdel, a combination of MENTHU and Lindel, that achieves the most predictive coverage of homogeneous out-of-frame mutations in this large dataset. We then estimated the frequency of Cas9-targetable homogeneous frameshift-inducing DSBs in vertebrate coding regions for gene discovery using MENdel. 47 out of 54 genes (87%) contained at least one early frameshift-inducing DSB and 49 out of 54 (91%) did so when also considering Cas12a-mediated deletions. We suggest that the use of MENdel helps researchers use MMEJ at scale for reverse genetics screenings and with sufficient intra-gene density rates to be viable for nearly all loss-of-function based gene editing therapeutic applications.


Assuntos
Algoritmos , Quebras de DNA de Cadeia Dupla , Reparo do DNA por Junção de Extremidades , Mutação da Fase de Leitura , Edição de Genes/métodos , Terapia Genética/métodos , Genômica/métodos , Mutação INDEL , Mutação com Perda de Função , Genética Reversa/métodos , Animais , Proteínas de Bactérias/metabolismo , Caspase 9/metabolismo , Conjuntos de Dados como Assunto , Células-Tronco Embrionárias/metabolismo , Humanos , Camundongos , Curva ROC , Streptococcus pyogenes/enzimologia , Peixe-Zebra/genética
14.
Bioinformatics ; 36(Suppl_2): i668-i674, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381825

RESUMO

MOTIVATION: The evolution of complexity is one of the most fascinating and challenging problems in modern biology, and tracing the evolution of complex traits is an open problem. In bacteria, operons and gene blocks provide a model of tractable evolutionary complexity at the genomic level. Gene blocks are structures of co-located genes with related functions, and operons are gene blocks whose genes are co-transcribed on a single mRNA molecule. The genes in operons and gene blocks typically work together in the same system or molecular complex. Previously, we proposed a method that explains the evolution of orthologous gene blocks (orthoblocks) as a combination of a small set of events that take place in vertical evolution from common ancestors. A heuristic method was proposed to solve this problem. However, no study was done to identify the complexity of the problem. RESULTS: Here, we establish that finding the homologous gene block problem is NP-hard and APX-hard. We have developed a greedy algorithm that runs in polynomial time and guarantees an O(ln⁡n) approximation. In addition, we formalize our problem as an integer linear program problem and solve it using the PuLP package and the standard CPLEX algorithm. Our exploration of several candidate operons reveals that our new method provides more optimal results than the results from the heuristic approach, and is significantly faster. AVAILABILITY AND IMPLEMENTATION: The software and data accompanying this paper are available under the GPLv3 and CC0 license respectively on: https://github.com/nguyenngochuy91/Relevant-Operon.


Assuntos
Genômica , Software , Algoritmos , Bactérias , Biologia Computacional , Dureza
15.
mSphere ; 5(3)2020 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-32493722

RESUMO

Gibberellin (GA) phytohormones are ubiquitous regulators of growth and developmental processes in vascular plants. The convergent evolution of GA production by plant-associated bacteria, including both symbiotic nitrogen-fixing rhizobia and phytopathogens, suggests that manipulation of GA signaling is a powerful mechanism for microbes to gain an advantage in these interactions. Although orthologous operons encode GA biosynthetic enzymes in both rhizobia and phytopathogens, notable genetic heterogeneity and scattered operon distribution in these lineages, including loss of the gene for the final biosynthetic step in most rhizobia, suggest varied functions for GA in these distinct plant-microbe interactions. Therefore, deciphering GA operon evolutionary history should provide crucial evidence toward understanding the distinct biological roles for bacterial GA production. To further establish the genetic composition of the GA operon, two operon-associated genes that exhibit limited distribution among rhizobia were biochemically characterized, verifying their roles in GA biosynthesis. This enabled employment of a maximum parsimony ancestral gene block reconstruction algorithm to characterize loss, gain, and horizontal gene transfer (HGT) of GA operon genes within alphaproteobacterial rhizobia, which exhibit the most heterogeneity among the bacteria containing this biosynthetic gene cluster. Collectively, this evolutionary analysis reveals a complex history for HGT of the entire GA operon, as well as the individual genes therein, and ultimately provides a basis for linking genetic content to bacterial GA functions in diverse plant-microbe interactions, including insight into the subtleties of the coevolving molecular interactions between rhizobia and their leguminous host plants.IMPORTANCE While production of phytohormones by plant-associated microbes has long been appreciated, identification of the gibberellin (GA) biosynthetic operon in plant-associated bacteria has revealed surprising genetic heterogeneity. Notably, this heterogeneity seems to be associated with the lifestyle of the microbe; while the GA operon in phytopathogenic bacteria does not seem to vary to any significant degree, thus enabling production of bioactive GA, symbiotic rhizobia exhibit a number of GA operon gene loss and gain events. This suggests that a unique set of selective pressures are exerted on this biosynthetic gene cluster in rhizobia. Through analysis of the evolutionary history of the GA operon in alphaproteobacterial rhizobia, which display substantial diversity in their GA operon structure and gene content, we provide insight into the effect of lifestyle and host interactions on the production of this phytohormone by plant-associated bacteria.


Assuntos
Bactérias/genética , Bactérias/metabolismo , Evolução Molecular , Giberelinas/metabolismo , Óperon , Vias Biossintéticas , Família Multigênica , Reguladores de Crescimento de Plantas/biossíntese , Plantas/microbiologia , Simbiose
16.
Drug Dev Res ; 81(1): 43-51, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31483516

RESUMO

Bacteriocins, the ribosomally produced antimicrobial peptides of bacteria, represent an untapped source of promising antibiotic alternatives. However, bacteriocins display diverse mechanisms of action, a narrow spectrum of activity, and inherent challenges in natural product isolation making in vitro verification of putative bacteriocins difficult. A subset of bacteriocins exert their antimicrobial effects through favorable biophysical interactions with the bacterial membrane mediated by the charge, hydrophobicity, and conformation of the peptide. We have developed a pipeline for bacteriocin-derived compound design and testing that combines sequence-free prediction of bacteriocins using machine learning and a simple biophysical trait filter to generate 20 amino acid peptides that can be synthesized and evaluated for activity. We generated 28,895 total 20-mer candidate peptides and scored them for charge, α-helicity, and hydrophobic moment. Of those, we selected 16 sequences for synthesis and evaluated their antimicrobial, cytotoxicity, and hemolytic activities. Peptides with the overall highest scores for our biophysical parameters exhibited significant antimicrobial activity against Escherichia coli and Pseudomonas aeruginosa. Our combined method incorporates machine learning and biophysical-based minimal region determination to create an original approach to swiftly discover bacteriocin candidates amenable to rapid synthesis and evaluation for therapeutic use.


Assuntos
Antibacterianos/síntese química , Peptídeos Catiônicos Antimicrobianos/síntese química , Bacteriocinas/química , Biologia Computacional/métodos , Antibacterianos/química , Antibacterianos/farmacologia , Peptídeos Catiônicos Antimicrobianos/química , Peptídeos Catiônicos Antimicrobianos/farmacologia , Desenho de Fármacos , Escherichia coli/efeitos dos fármacos , Escherichia coli/crescimento & desenvolvimento , Interações Hidrofóbicas e Hidrofílicas , Aprendizado de Máquina , Testes de Sensibilidade Microbiana , Domínios Proteicos , Estrutura Secundária de Proteína , Pseudomonas aeruginosa/efeitos dos fármacos , Pseudomonas aeruginosa/crescimento & desenvolvimento , Staphylococcus aureus/efeitos dos fármacos , Staphylococcus aureus/crescimento & desenvolvimento , Relação Estrutura-Atividade
17.
CRISPR J ; 2(6): 417-433, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31742435

RESUMO

CRISPR and CRISPR-Cas effector proteins enable the targeting of DNA double-strand breaks to defined loci based on a variable length RNA guide specific to each effector. The guide RNAs are generally similar in size and form, consisting of a ∼20 nucleotide sequence complementary to the DNA target and an RNA secondary structure recognized by the effector. However, the effector proteins vary in protospacer adjacent motif requirements, nuclease activities, and DNA binding kinetics. Recently, ErCas12a, a new member of the Cas12a family, was identified in Eubacterium rectale. Here, we report the first characterization of ErCas12a activity in zebrafish and expand on previously reported activity in human cells. Using a fluorescent reporter system, we show that CRISPR-ErCas12a elicits strand annealing mediated DNA repair more efficiently than CRISPR-Cas9. Further, using our previously reported gene targeting method that utilizes short homology, GeneWeld, we demonstrate the use of CRISPR-ErCas12a to integrate reporter alleles into the genomes of both zebrafish and human cells. Together, this work provides methods for deploying an additional CRISPR-Cas system, thus increasing the flexibility researchers have in applying genome engineering technologies.


Assuntos
Sistemas CRISPR-Cas/genética , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Edição de Genes/métodos , Animais , Sequência de Bases , Proteínas Associadas a CRISPR/genética , DNA/química , Marcação de Genes/métodos , Engenharia Genética/métodos , Genoma/genética , Humanos , RNA/química , RNA Guia de Cinetoplastídeos/química , Peixe-Zebra/genética
18.
Cell Syst ; 9(6): 600-608.e4, 2019 12 18.
Artigo em Inglês | MEDLINE | ID: mdl-31629686

RESUMO

Ribosomally synthesized and post-translationally modified peptides (RiPPs) are an important class of natural products that contain antibiotics and a variety of other bioactive compounds. The existing methods for discovery of RiPPs by combining genome mining and computational mass spectrometry are limited to discovering specific classes of RiPPs from small datasets, and these methods fail to handle unknown post-translational modifications. Here, we present MetaMiner, a software tool for addressing these challenges that is compatible with large-scale screening platforms for natural product discovery. After searching millions of spectra in the Global Natural Products Social (GNPS) molecular networking infrastructure against just eight genomic and metagenomic datasets, MetaMiner discovered 31 known and seven unknown RiPPs from diverse microbial communities, including human microbiome and lichen microbiome, and microorganisms isolated from the International Space Station.


Assuntos
Biologia Computacional/métodos , Microbiota/genética , Processamento de Proteína Pós-Traducional/genética , Genômica/métodos , Humanos , Peptídeos/química , Ribossomos/genética , Software
19.
Gigascience ; 8(10)2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-31648300

RESUMO

BACKGROUND: Gene homology type classification is required for many types of genome analyses, including comparative genomics, phylogenetics, and protein function annotation. Consequently, a large variety of tools have been developed to perform homology classification across genomes of different species. However, when applied to large genomic data sets, these tools require high memory and CPU usage, typically available only in computational clusters. FINDINGS: Here we present a new graph-based orthology analysis tool, SwiftOrtho, which is optimized for speed and memory usage when applied to large-scale data. SwiftOrtho uses long k-mers to speed up homology search, while using a reduced amino acid alphabet and spaced seeds to compensate for the loss of sensitivity due to long k-mers. In addition, it uses an affinity propagation algorithm to reduce the memory usage when clustering large-scale orthology relationships into orthologous groups. In our tests, SwiftOrtho was the only tool that completed orthology analysis of proteins from 1,760 bacterial genomes on a computer with only 4 GB RAM. Using various standard orthology data sets, we also show that SwiftOrtho has a high accuracy. CONCLUSIONS: SwiftOrtho enables the accurate comparative genomic analyses of thousands of genomes using low-memory computers. SwiftOrtho is available at https://github.com/Rinoahu/SwiftOrtho.


Assuntos
Genômica/métodos , Algoritmos , Proteínas de Bactérias/genética , Análise por Conglomerados , Dispositivos de Armazenamento em Computador , Genoma Bacteriano
20.
Front Plant Sci ; 10: 1050, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31555312

RESUMO

Background: An organism can be described by its observable features (phenotypes) and the genes and genomic information (genotypes) that cause these phenotypes. For many decades, researchers have tried to find relationships between genotypes and phenotypes, and great strides have been made. However, improved methods and tools for discovering and visualizing these phenotypic relationships are still needed. The maize genetics and genomics database (MaizeGDB, www.maizegdb.org) provides an array of useful resources for diverse data types including thousands of images related to mutant phenotypes in Zea mays ssp. mays (maize). To integrate mutant phenotype images with genomics information, we implemented and enhanced the web-based software package BioDIG (Biological Database of Images and Genomes). Findings: We developed a genotype-phenotype database for maize called MaizeDIG. MaizeDIG has several enhancements over the original BioDIG package. MaizeDIG, which supports multiple reference genome assemblies, is seamlessly integrated with genome browsers to accommodate custom tracks showing tagged mutant phenotypes images in their genomic context and allows for custom tagging of images to highlight the phenotype. This is accomplished through an updated interface allowing users to create image-to-gene links and is accessible via the image search tool. Conclusions: We have created a user-friendly and extensible web-based resource called MaizeDIG. MaizeDIG is preloaded with 2,396 images that are available on genome browsers for 10 different maize reference genomes. Approximately 90 images of classically defined maize genes have been manually annotated. MaizeDIG is available at http://maizedig.maizegdb.org/. The code is free and open source and can be found at https://github.com/Maize-Genetics-and-Genomics-Database/maizedig.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...