Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Nucleic Acids Res ; 51(21): 11504-11517, 2023 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-37897345

RESUMO

Large regions of prokaryotic genomes are currently without any annotation, in part due to well-established limitations of annotation tools. For example, it is routine for genes using alternative start codons to be misreported or completely omitted. Therefore, we present StORF-Reporter, a tool that takes an annotated genome and returns regions that may contain missing CDS genes from unannotated regions. StORF-Reporter consists of two parts. The first begins with the extraction of unannotated regions from an annotated genome. Next, Stop-ORFs (StORFs) are identified in these unannotated regions. StORFs are open reading frames that are delimited by stop codons and thus can capture those genes most often missing in genome annotations. We show this methodology recovers genes missing from canonical genome annotations. We inspect the results of the genomes of model organisms, the pangenome of Escherichia coli, and a set of 5109 prokaryotic genomes of 247 genera from the Ensembl Bacteria database. StORF-Reporter extended the core, soft-core and accessory gene collections, identified novel gene families and extended families into additional genera. The high levels of sequence conservation observed between genera suggest that many of these StORFs are likely to be functional genes that should now be considered for inclusion in canonical annotations.


Assuntos
Escherichia coli , Genoma Bacteriano , Fases de Leitura Aberta/genética , Bases de Dados Factuais , Escherichia coli/genética , Anotação de Sequência Molecular
2.
Bioinformatics ; 38(5): 1198-1207, 2022 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-34875010

RESUMO

MOTIVATION: The biases in CoDing Sequence (CDS) prediction tools, which have been based on historic genomic annotations from model organisms, impact our understanding of novel genomes and metagenomes. This hinders the discovery of new genomic information as it results in predictions being biased towards existing knowledge. To date, users have lacked a systematic and replicable approach to identify the strengths and weaknesses of any CDS prediction tool and allow them to choose the right tool for their analysis. RESULTS: We present an evaluation framework (ORForise) based on a comprehensive set of 12 primary and 60 secondary metrics that facilitate the assessment of the performance of CDS prediction tools. This makes it possible to identify which performs better for specific use-cases. We use this to assess 15 ab initio- and model-based tools representing those most widely used (historically and currently) to generate the knowledge in genomic databases. We find that the performance of any tool is dependent on the genome being analysed, and no individual tool ranked as the most accurate across all genomes or metrics analysed. Even the top-ranked tools produced conflicting gene collections, which could not be resolved by aggregation. The ORForise evaluation framework provides users with a replicable, data-led approach to make informed tool choices for novel genome annotations and for refining historical annotations. AVAILABILITY AND IMPLEMENTATION: Code and datasets for reproduction and customisation are available at https://github.com/NickJD/ORForise. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Células Procarióticas , Anotação de Sequência Molecular , Metagenoma
3.
Bioinformatics ; 37(10): 1360-1366, 2021 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-33444437

RESUMO

MOTIVATION: Population-level genetic variation enables competitiveness and niche specialization in microbial communities. Despite the difficulty in culturing many microbes from an environment, we can still study these communities by isolating and sequencing DNA directly from an environment (metagenomics). Recovering the genomic sequences of all isoforms of a given gene across all organisms in a metagenomic sample would aid evolutionary and ecological insights into microbial ecosystems with potential benefits for medicine and biotechnology. A significant obstacle to this goal arises from the lack of a computationally tractable solution that can recover these sequences from sequenced read fragments. This poses a problem analogous to reconstructing the two sequences that make up the genome of a diploid organism (i.e. haplotypes) but for an unknown number of individuals and haplotypes. RESULTS: The problem of single individual haplotyping was first formalized by Lancia et al. in 2001. Now, nearly two decades later, we discuss the complexity of 'haplotyping' metagenomic samples, with a new formalization of Lancia et al.'s data structure that allows us to effectively extend the single individual haplotype problem to microbial communities. This work describes and formalizes the problem of recovering genes (and other genomic subsequences) from all individuals within a complex community sample, which we term the metagenomic individual haplotyping problem. We also provide software implementations for a pairwise single nucleotide variant (SNV) co-occurrence matrix and greedy graph traversal algorithm. AVAILABILITY AND IMPLEMENTATION: Our reference implementation of the described pairwise SNV matrix (Hansel) and greedy haplotype path traversal algorithm (Gretel) is open source, MIT licensed and freely available online at github.com/samstudio8/hansel and github.com/samstudio8/gretel, respectively.

4.
Bioinformatics ; 24(13): i295-303, 2008 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-18586727

RESUMO

MOTIVATION: Many published manuscripts contain experiment protocols which are poorly described or deficient in information. This means that the published results are very hard or impossible to repeat. This problem is being made worse by the increasing complexity of high-throughput/automated methods. There is therefore a growing need to represent experiment protocols in an efficient and unambiguous way. RESULTS: We have developed the Experiment ACTions (EXACT) ontology as the basis of a method of representing biological laboratory protocols. We provide example protocols that have been formalized using EXACT, and demonstrate the advantages and opportunities created by using this formalization. We argue that the use of EXACT will result in the publication of protocols with increased clarity and usefulness to the scientific community. AVAILABILITY: The ontology, examples and code can be downloaded from http://www.aber.ac.uk/compsci/Research/bio/dss/EXACT/.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Documentação/métodos , Armazenamento e Recuperação da Informação/métodos , Internet , Pesquisa/classificação , Pesquisa/normas
5.
PLoS One ; 10(12): e0142494, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26630677

RESUMO

Many advances in synthetic biology require the removal of a large number of genomic elements from a genome. Most existing deletion methods leave behind markers, and as there are a limited number of markers, such methods can only be applied a fixed number of times. Deletion methods that recycle markers generally are either imprecise (remove untargeted sequences), or leave scar sequences which can cause genome instability and rearrangements. No existing marker recycling method is automation-friendly. We have developed a novel openly available deletion tool that consists of: 1) a method for deleting genomic elements that can be repeatedly used without limit, is precise, scar-free, and suitable for automation; and 2) software to design the method's primers. Our tool is sequence agnostic and could be used to delete large numbers of coding sequences, promoter regions, transcription factor binding sites, terminators, etc in a single genome. We have validated our tool on the deletion of non-essential open reading frames (ORFs) from S. cerevisiae. The tool is applicable to arbitrary genomes, and we provide primer sequences for the deletion of: 90% of the ORFs from the S. cerevisiae genome, 88% of the ORFs from S. pombe genome, and 85% of the ORFs from the L. lactis genome.


Assuntos
Cicatriz/genética , Biologia Computacional/métodos , Genoma Fúngico , Fases de Leitura Aberta/genética , Saccharomyces cerevisiae/genética , Schizosaccharomyces/genética , Deleção de Sequência , Automação , Genômica/métodos , Reação em Cadeia da Polimerase , Software
6.
J R Soc Interface ; 12(104): 20141289, 2015 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-25652463

RESUMO

There is an urgent need to make drug discovery cheaper and faster. This will enable the development of treatments for diseases currently neglected for economic reasons, such as tropical and orphan diseases, and generally increase the supply of new drugs. Here, we report the Robot Scientist 'Eve' designed to make drug discovery more economical. A Robot Scientist is a laboratory automation system that uses artificial intelligence (AI) techniques to discover scientific knowledge through cycles of experimentation. Eve integrates and automates library-screening, hit-confirmation, and lead generation through cycles of quantitative structure activity relationship learning and testing. Using econometric modelling we demonstrate that the use of AI to select compounds economically outperforms standard drug screening. For further efficiency Eve uses a standardized form of assay to compute Boolean functions of compound properties. These assays can be quickly and cheaply engineered using synthetic biology, enabling more targets to be assayed for a given budget. Eve has repositioned several drugs against specific targets in parasites that cause tropical diseases. One validated discovery is that the anti-cancer compound TNP-470 is a potent inhibitor of dihydrofolate reductase from the malaria-causing parasite Plasmodium vivax.


Assuntos
Desenho de Fármacos , Reposicionamento de Medicamentos , Doenças Raras/tratamento farmacológico , Tecnologia Farmacêutica/tendências , Algoritmos , Antineoplásicos/uso terapêutico , Automação , Avaliação Pré-Clínica de Medicamentos , Humanos , Malária Vivax/tratamento farmacológico , Modelos Estatísticos , Plasmodium vivax/efeitos dos fármacos , Relação Quantitativa Estrutura-Atividade , Análise de Regressão , Reprodutibilidade dos Testes , Software , Medicina Tropical
7.
PLoS One ; 8(11): e80156, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24278254

RESUMO

BACKGROUND: Complex PCR applications for large genome-scale projects require fast, reliable and often highly sophisticated primer design software applications. Presently, such applications use pipelining methods to utilise many third party applications and this involves file parsing, interfacing and data conversion, which is slow and prone to error. A fully integrated suite of software tools for primer design would considerably improve the development time, the processing speed, and the reliability of bespoke primer design software applications. RESULTS: The PD5 software library is an open-source collection of classes and utilities, providing a complete collection of software building blocks for primer design and analysis. It is written in object-oriented C(++) with an emphasis on classes suitable for efficient and rapid development of bespoke primer design programs. The modular design of the software library simplifies the development of specific applications and also integration with existing third party software where necessary. We demonstrate several applications created using this software library that have already proved to be effective, but we view the project as a dynamic environment for building primer design software and it is open for future development by the bioinformatics community. Therefore, the PD5 software library is published under the terms of the GNU General Public License, which guarantee access to source-code and allow redistribution and modification. CONCLUSIONS: The PD5 software library is downloadable from Google Code and the accompanying Wiki includes instructions and examples: http://code.google.com/p/primer-design.


Assuntos
Primers do DNA , Reação em Cadeia da Polimerase , Software , Linguagens de Programação
8.
Open Biol ; 3(2): 120158, 2013 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-23446112

RESUMO

We have developed a robust, fully automated anti-parasitic drug-screening method that selects compounds specifically targeting parasite enzymes and not their host counterparts, thus allowing the early elimination of compounds with potential side effects. Our yeast system permits multiple parasite targets to be assayed in parallel owing to the strains' expression of different fluorescent proteins. A strain expressing the human target is included in the multiplexed screen to exclude compounds that do not discriminate between host and parasite enzymes. This form of assay has the advantages of using known targets and not requiring the in vitro culture of parasites. We performed automated screens for inhibitors of parasite dihydrofolate reductases, N-myristoyltransferases and phosphoglycerate kinases, finding specific inhibitors of parasite targets. We found that our 'hits' have significant structural similarities to compounds with in vitro anti-parasitic activity, validating our screens and suggesting targets for hits identified in parasite-based assays. Finally, we demonstrate a 60 per cent success rate for our hit compounds in killing or severely inhibiting the growth of Trypanosoma brucei, the causative agent of African sleeping sickness.


Assuntos
Antiparasitários/farmacologia , Chumbo/química , Bibliotecas de Moléculas Pequenas/química , Tripanossomíase Africana/tratamento farmacológico , Antiparasitários/química , Descoberta de Drogas , Ensaios de Triagem em Larga Escala , Humanos , Chumbo/farmacologia , Trypanosoma brucei brucei/efeitos dos fármacos , Tripanossomíase Africana/patologia , Leveduras/efeitos dos fármacos
9.
Autom Exp ; 2: 1, 2010 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-20119518

RESUMO

We review the main components of autonomous scientific discovery, and how they lead to the concept of a Robot Scientist. This is a system which uses techniques from artificial intelligence to automate all aspects of the scientific discovery process: it generates hypotheses from a computer model of the domain, designs experiments to test these hypotheses, runs the physical experiments using robotic systems, analyses and interprets the resulting data, and repeats the cycle. We describe our two prototype Robot Scientists: Adam and Eve. Adam has recently proven the potential of such systems by identifying twelve genes responsible for catalysing specific reactions in the metabolic pathways of the yeast Saccharomyces cerevisiae. This work has been formally recorded in great detail using logic. We argue that the reporting of science needs to become fully formalised and that Robot Scientists can help achieve this. This will make scientific information more reproducible and reusable, and promote the integration of computers in scientific reasoning. We believe the greater automation of both the physical and intellectual aspects of scientific investigations to be essential to the future of science. Greater automation improves the accuracy and reliability of experiments, increases the pace of discovery and, in common with conventional laboratory automation, removes tedious and repetitive tasks from the human scientist.

10.
Science ; 324(5923): 85-9, 2009 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-19342587

RESUMO

The basis of science is the hypothetico-deductive method and the recording of experiments in sufficient detail to enable reproducibility. We report the development of Robot Scientist "Adam," which advances the automation of both. Adam has autonomously generated functional genomics hypotheses about the yeast Saccharomyces cerevisiae and experimentally tested these hypotheses by using laboratory automation. We have confirmed Adam's conclusions through manual experiments. To describe Adam's research, we have developed an ontology and logical language. The resulting formalization involves over 10,000 different research units in a nested treelike structure, 10 levels deep, that relates the 6.6 million biomass measurements to their logical description. This formalization describes how a machine contributed to scientific knowledge.


Assuntos
Inteligência Artificial , Automação , Biologia Computacional , Enzimas/genética , Genes Fúngicos , Saccharomyces cerevisiae/genética , Computadores , Genômica , Linguagens de Programação , Robótica , Saccharomyces cerevisiae/enzimologia , Saccharomyces cerevisiae/crescimento & desenvolvimento , Saccharomyces cerevisiae/metabolismo , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA