Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Rheumatology (Oxford) ; 57(7): 1222-1227, 2018 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-29608774

RESUMEN

OBJECTIVES: B-cell activating factor (BAFF), ß-2 microglobulin (ß2M) and serum free light chains (FLCs) are elevated in primary SS (pSS) and associated with disease activity. We aimed to investigate their association with the individual disease activity domains of the EULAR Sjögren's Syndrome Disease Activity Index (ESSDAI) in a large well-characterized pSS cohort. METHODS: Sera from pSS patients enrolled in the UK Primary Sjögren's Syndrome Registry (UKPSSR) (n = 553) and healthy controls (n = 286) were analysed for FLC (κ and λ), BAFF and ß2 M. Pearson correlation coefficients were calculated for patient clinical characteristics, including salivary flow, Schirmer's test, EULAR Sjögren's Syndrome Patient Reported Index and serum IgG levels. Poisson regression was performed to identify independent predictors of total ESSDAI and ClinESSDAI (validated ESSDAI minus the biological domain) scores and their domains. RESULTS: Levels of BAFF, ß2M and FLCs were higher in pSS patients compared to controls. All three biomarkers associated significantly with the ESSDAI and the ClinESSDAI. BAFF associated with the peripheral nervous system domain of the ESSDAI, whereas ß2M and FLCs associated with the cutaneous, biological and renal domains. Multivariate analysis showed BAFF, ß2M and their interaction to be independent predictors of ESSDAI/ClinESSDAI. FLCs were also shown to associate with the ESSDAI/ClinESSDAI but not independent of serum IgG. CONCLUSION: All biomarkers were associated with total ESSDAI scores but with differing domain associations. These findings should encourage further investigation of these biomarkers in longitudinal studies and against other disease activity measures.

2.
PLoS One ; 19(5): e0303231, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38771886

RESUMEN

Extracting biological interactions from published literature helps us understand complex biological systems, accelerate research, and support decision-making in drug or treatment development. Despite efforts to automate the extraction of biological relations using text mining tools and machine learning pipelines, manual curation continues to serve as the gold standard. However, the rapidly increasing volume of literature pertaining to biological relations poses challenges in its manual curation and refinement. These challenges are further compounded because only a small fraction of the published literature is relevant to biological relation extraction, and the embedded sentences of relevant sections have complex structures, which can lead to incorrect inference of relationships. To overcome these challenges, we propose GIX, an automated and robust Gene Interaction Extraction framework, based on pre-trained Large Language models fine-tuned through extensive evaluations on various gene/protein interaction corpora including LLL and RegulonDB. GIX identifies relevant publications with minimal keywords, optimises sentence selection to reduce computational overhead, simplifies sentence structure while preserving meaning, and provides a confidence factor indicating the reliability of extracted relations. GIX's Stage-2 relation extraction method performed well on benchmark protein/gene interaction datasets, assessed using 10-fold cross-validation, surpassing state-of-the-art approaches. We demonstrated that the proposed method, although fully automated, performs as well as manual relation extraction, with enhanced robustness. We also observed GIX's capability to augment existing datasets with new sentences, incorporating newly discovered biological terms and processes. Further, we demonstrated GIX's real-world applicability in inferring E. coli gene circuits.


Asunto(s)
Minería de Datos , Minería de Datos/métodos , Procesamiento de Lenguaje Natural , Aprendizaje Automático , Biología Computacional/métodos , Humanos , Algoritmos
3.
Bioinformatics ; 28(11): 1495-500, 2012 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-22492647

RESUMEN

MOTIVATION: Biological experiments give insight into networks of processes inside a cell, but are subject to error and uncertainty. However, due to the overlap between the large number of experiments reported in public databases it is possible to assess the chances of individual observations being correct. In order to do so, existing methods rely on high-quality 'gold standard' reference networks, but such reference networks are not always available. RESULTS: We present a novel algorithm for computing the probability of network interactions that operates without gold standard reference data. We show that our algorithm outperforms existing gold standard-based methods. Finally, we apply the new algorithm to a large collection of genetic interaction and protein-protein interaction experiments. AVAILABILITY: The integrated dataset and a reference implementation of the algorithm as a plug-in for the Ondex data integration framework are available for download at http://bio-nexus.ncl.ac.uk/projects/nogold/


Asunto(s)
Algoritmos , Teorema de Bayes , Epistasis Genética , Mapeo de Interacción de Proteínas/normas , Funciones de Verosimilitud , Mapeo de Interacción de Proteínas/métodos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
4.
PLoS One ; 18(7): e0288174, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37418430

RESUMEN

In systems biology, the accurate reconstruction of Gene Regulatory Networks (GRNs) is crucial since these networks can facilitate the solving of complex biological problems. Amongst the plethora of methods available for GRN reconstruction, information theory and fuzzy concepts-based methods have abiding popularity. However, most of these methods are not only complex, incurring a high computational burden, but they may also produce a high number of false positives, leading to inaccurate inferred networks. In this paper, we propose a novel hybrid fuzzy GRN inference model called MICFuzzy which involves the aggregation of the effects of Maximal Information Coefficient (MIC). This model has an information theory-based pre-processing stage, the output of which is applied as an input to the novel fuzzy model. In this preprocessing stage, the MIC component filters relevant genes for each target gene to significantly reduce the computational burden of the fuzzy model when selecting the regulatory genes from these filtered gene lists. The novel fuzzy model uses the regulatory effect of the identified activator-repressor gene pairs to determine target gene expression levels. This approach facilitates accurate network inference by generating a high number of true regulatory interactions while significantly reducing false regulatory predictions. The performance of MICFuzzy was evaluated using DREAM3 and DREAM4 challenge data, and the SOS real gene expression dataset. MICFuzzy outperformed the other state-of-the-art methods in terms of F-score, Matthews Correlation Coefficient, Structural Accuracy, and SS_mean, and outperformed most of them in terms of efficiency. MICFuzzy also had improved efficiency compared with the classical fuzzy model since the design of MICFuzzy leads to a reduction in combinatorial computation.


Asunto(s)
Algoritmos , Biología Computacional , Biología Computacional/métodos , Redes Reguladoras de Genes , Biología de Sistemas , Genes Reguladores , Modelos Genéticos
5.
Bioinformatics ; 27(7): 973-9, 2011 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-21296753

RESUMEN

MOTIVATION: The need for the automated computational design of genetic circuits is becoming increasingly apparent with the advent of ever more complex and ambitious synthetic biology projects. Currently, most circuits are designed through the assembly of models of individual parts such as promoters, ribosome binding sites and coding sequences. These low level models are combined to produce a dynamic model of a larger device that exhibits a desired behaviour. The larger model then acts as a blueprint for physical implementation at the DNA level. However, the conversion of models of complex genetic circuits into DNA sequences is a non-trivial undertaking due to the complexity of mapping the model parts to their physical manifestation. Automating this process is further hampered by the lack of computationally tractable information in most models. RESULTS: We describe a method for automatically generating DNA sequences from dynamic models implemented in CellML and Systems Biology Markup Language (SBML). We also identify the metadata needed to annotate models to facilitate automated conversion, and propose and demonstrate a method for the markup of these models using RDF. Our algorithm has been implemented in a software tool called MoSeC. AVAILABILITY: The software is available from the authors' web site http://research.ncl.ac.uk/synthetic_biology/downloads.html.


Asunto(s)
Modelos Genéticos , Anotación de Secuencia Molecular/métodos , Biología Sintética/métodos , Algoritmos , Secuencia de Bases , ADN/química , Programas Informáticos , Biología de Sistemas/métodos
6.
Bioinformatics ; 27(9): 1299-306, 2011 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-21414991

RESUMEN

MOTIVATION: The rise of high-throughput technologies in the post-genomic era has led to the production of large amounts of biological data. Many of these datasets are freely available on the Internet. Making optimal use of these data is a significant challenge for bioinformaticians. Various strategies for integrating data have been proposed to address this challenge. One of the most promising approaches is the development of semantically rich integrated datasets. Although well suited to computational manipulation, such integrated datasets are typically too large and complex for easy visualization and interactive exploration. RESULTS: We have created an integrated dataset for Saccharomyces cerevisiae using the semantic data integration tool Ondex, and have developed a view-based visualization technique that allows for concise graphical representations of the integrated data. The technique was implemented in a plug-in for Cytoscape, called OndexView. We used OndexView to investigate telomere maintenance in S. cerevisiae. AVAILABILITY: The Ondex yeast dataset and the OndexView plug-in for Cytoscape are accessible at http://bsu.ncl.ac.uk/ondexview.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Almacenamiento y Recuperación de la Información/métodos , Biología de Sistemas/métodos , Internet , Saccharomyces cerevisiae/genética , Telómero/genética
7.
Biosystems ; 220: 104736, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-35863700

RESUMEN

S-System models, non-linear differential equation models, are widely used for reconstructing gene regulatory networks from temporal gene expression data. An S-System model involves two states, generation and degeneration, and uses the kinetic parameters gij and hij, to represent the direction, nature, and intensity of the genetic interactions. The need for learning a large number of model parameters results in increased computational expense. Previously, we improved the performance of the algorithm using dynamic allocation of the maximum in-degree for each gene. While the method was effective for smaller networks, a large amount of computation was still needed for larger networks. This problem arose mainly due to the increased occurrence of invalid networks during optimization, primarily because the two kinetic parameters (gij and hij) of the S-System model converge independently during optimization. Being independent, these two parameters can converge to values that can indicate contradictory gene interactions, specifically inhibition or activation. In this study, to address this major challenge in S-System modelling, we developed a novel method that includes two features: a penalty term that penalizes those networks with invalid kinetic orders, and a parameter, wij, derived by combining the kinetic parameters gij and hij. The novel penalty term was used for candidate selection during the process of optimizing the DRNI (Dynamically Regulated Network Initialization) algorithm. Rather than remaining constant, it is dynamic, with its magnitude dependent on the number of invalid interactions in the given network. This approach encourages the generation of valid candidate solutions, and eliminates invalid networks in a systematic manner. The previous DRNI method, a two-stage approach which uses dynamic allocation of the maximum in-degree for each gene, was further improved by adding a third stage which applies the proposed wij to handle the invalid regulations that may still exist in that candidate solutions. The method was tested on different gene expression datasets, and was able to reduce the number of iterations and produce improved network accuracies. For a 20 gene network, the number of generations required for convergence was reduced by 300, and the F-score improved by 0.05 compared to our previously reported DRNI approach. For the well-known 10 gene networks of the DREAM challenge, our method produced an improvement in the average area under the ROC curve of the DREAM4 10 gene networks.


Asunto(s)
Biología Computacional , Redes Reguladoras de Genes , Algoritmos , Biología Computacional/métodos , Redes Reguladoras de Genes/genética , Cinética
8.
Biosystems ; 219: 104730, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-35772570

RESUMEN

The use of microorganisms for the production of industrially important compounds and enzymes is becoming increasingly important. Eukaryotes have been less widely used than prokaryotes in biotechnology, because of the complexity of their genomic structure and biology. The Yeast2.0 project is an international effort to engineer the yeast Saccharomyces cerevisiae to make it easy to manipulate, and to generate random variants using a system called SCRaMbLE. SCRaMbLE relies on artificial evolution in vitro to identify useful variants, an approach which is time consuming and expensive. We developed an in silico simulator for the SCRaMbLE system, using an evolutionary computing approach, which can be used to investigate and optimize the fitness landscape of the system. We applied the system to the investigation of the fitness landscape of one of the S. saccharomyces chromosomes, and found that our results fitted well with those previously published. We then simulated directed evolution with or without manipulation of SCRaMbLE, and revealed that controlling the SCRaMbLE process could effectively impact directed evolution. Our simulator can be applied to the analysis of the fitness landscapes of any organism for which SCRaMbLE has been implemented.


Asunto(s)
Genoma Fúngico , Saccharomyces cerevisiae , Cromosomas , Aptitud Genética/genética , Genoma Fúngico/genética , Genómica , Saccharomyces cerevisiae/genética
9.
Biosystems ; 221: 104757, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36007675

RESUMEN

The reconstruction of Gene Regulatory Networks (GRNs) from time series gene expression data is highly relevant for the discovery of complex biological interactions and dynamics. Various computational strategies have been developed for this task, but most approaches have low computational efficiency and are not able to cope with high-dimensional, low sample-number, gene expression data. In this paper, we introduce a novel combined filter feature selection approach for efficient and accurate inference of GRNs. A Boolean framework for network modelling is used to demonstrate the efficacy of the proposed approach. Using discretized microarray expression data, the genes most relevant to each target gene are first filtered using ReliefF, an instance-based feature ranking method that is here applied for the first time to GRN inference. Then, further gene selection from the filtered-gene list is done using a mutual information-based min-redundancy max-relevance criterion by eliminating irrelevant genes. This combined method is executed on resampled datasets to finalize the optimal set of regulatory genes. Building upon our previous research, a Pearson correlation coefficient-based Boolean modelling approach is utilized for the efficient identification of the optimal regulatory rules associated with selected regulatory genes. The proposed approach was evaluated using gene expression datasets from small-scale and medium-scale real gene networks, and was observed to be more effective than Linear Discriminant Analysis, performed better than the individual feature selection methods, and obtained improved Structural Accuracy with a higher number of true positives than other state-of-the-art methods, while outperforming these methods with respect to Dynamic Accuracy and efficiency.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Biología Computacional/métodos , Redes Reguladoras de Genes/genética , Factores de Tiempo
10.
Mol Syst Biol ; 6: 347, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-20160708

RESUMEN

Cellular senescence--the permanent arrest of cycling in normally proliferating cells such as fibroblasts--contributes both to age-related loss of mammalian tissue homeostasis and acts as a tumour suppressor mechanism. The pathways leading to establishment of senescence are proving to be more complex than was previously envisaged. Combining in-silico interactome analysis and functional target gene inhibition, stochastic modelling and live cell microscopy, we show here that there exists a dynamic feedback loop that is triggered by a DNA damage response (DDR) and, which after a delay of several days, locks the cell into an actively maintained state of 'deep' cellular senescence. The essential feature of the loop is that long-term activation of the checkpoint gene CDKN1A (p21) induces mitochondrial dysfunction and production of reactive oxygen species (ROS) through serial signalling through GADD45-MAPK14(p38MAPK)-GRB2-TGFBR2-TGFbeta. These ROS in turn replenish short-lived DNA damage foci and maintain an ongoing DDR. We show that this loop is both necessary and sufficient for the stability of growth arrest during the establishment of the senescent phenotype.


Asunto(s)
Senescencia Celular/fisiología , Inhibidor p21 de las Quinasas Dependientes de la Ciclina/biosíntesis , Especies Reactivas de Oxígeno/metabolismo , Ciclo Celular , Simulación por Computador , Inhibidor p21 de las Quinasas Dependientes de la Ciclina/genética , Daño del ADN , Retroalimentación Fisiológica/fisiología , Histocitoquímica , Humanos , Mitocondrias/metabolismo , Modelos Biológicos , Transducción de Señal/fisiología , Procesos Estocásticos , Biología de Sistemas/métodos
12.
ACS Synth Biol ; 5(6): 487-97, 2016 06 17.
Artículo en Inglés | MEDLINE | ID: mdl-27268205

RESUMEN

Recently, synthetic biologists have developed the Synthetic Biology Open Language (SBOL), a data exchange standard for descriptions of genetic parts, devices, modules, and systems. The goals of this standard are to allow scientists to exchange designs of biological parts and systems, to facilitate the storage of genetic designs in repositories, and to facilitate the description of genetic designs in publications. In order to achieve these goals, the development of an infrastructure to store, retrieve, and exchange SBOL data is necessary. To address this problem, we have developed the SBOL Stack, a Resource Description Framework (RDF) database specifically designed for the storage, integration, and publication of SBOL data. This database allows users to define a library of synthetic parts and designs as a service, to share SBOL data with collaborators, and to store designs of biological systems locally. The database also allows external data sources to be integrated by mapping them to the SBOL data model. The SBOL Stack includes two Web interfaces: the SBOL Stack API and SynBioHub. While the former is designed for developers, the latter allows users to upload new SBOL biological designs, download SBOL documents, search by keyword, and visualize SBOL data. Since the SBOL Stack is based on semantic Web technology, the inherent distributed querying functionality of RDF databases can be used to allow different SBOL stack databases to be queried simultaneously, and therefore, data can be shared between different institutes, centers, or other users.


Asunto(s)
Sistemas de Administración de Bases de Datos , Lenguajes de Programación , Biología Sintética , Bases de Datos Factuales , Edición
13.
ACS Synth Biol ; 5(10): 1086-1097, 2016 10 21.
Artículo en Inglés | MEDLINE | ID: mdl-27110921

RESUMEN

One aim of synthetic biologists is to create novel and predictable biological systems from simpler modular parts. This approach is currently hampered by a lack of well-defined and characterized parts and devices. However, there is a wealth of existing biological information, which can be used to identify and characterize biological parts, and their design constraints in the literature and numerous biological databases. However, this information is spread among these databases in many different formats. New computational approaches are required to make this information available in an integrated format that is more amenable to data mining. A tried and tested approach to this problem is to map disparate data sources into a single data set, with common syntax and semantics, to produce a data warehouse or knowledge base. Ontologies have been used extensively in the life sciences, providing this common syntax and semantics as a model for a given biological domain, in a fashion that is amenable to computational analysis and reasoning. Here, we present an ontology for applications in synthetic biology design, SyBiOnt, which facilitates the modeling of information about biological parts and their relationships. SyBiOnt was used to create the SyBiOntKB knowledge base, incorporating and building upon existing life sciences ontologies and standards. The reasoning capabilities of ontologies were then applied to automate the mining of biological parts from this knowledge base. We propose that this approach will be useful to speed up synthetic biology design and ultimately help facilitate the automation of the biological engineering life cycle.


Asunto(s)
Minería de Datos , Bases de Datos Genéticas , Biología Sintética , Bacillus subtilis/genética , Bacillus subtilis/metabolismo , Biología Computacional , ADN Bacteriano/genética , Bases del Conocimiento , Regiones Promotoras Genéticas , Análisis de Secuencia de ADN , Programas Informáticos
14.
PLoS One ; 10(12): e0143970, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26694930

RESUMEN

BACKGROUND: Fatigue is a debilitating condition with a significant impact on patients' quality of life. Fatigue is frequently reported by patients suffering from primary Sjögren's Syndrome (pSS), a chronic autoimmune condition characterised by dryness of the eyes and the mouth. However, although fatigue is common in pSS, it does not manifest in all sufferers, providing an excellent model with which to explore the potential underpinning biological mechanisms. METHODS: Whole blood samples from 133 fully-phenotyped pSS patients stratified for the presence of fatigue, collected by the UK primary Sjögren's Syndrome Registry, were used for whole genome microarray. The resulting data were analysed both on a gene by gene basis and using pre-defined groups of genes. Finally, gene set enrichment analysis (GSEA) was used as a feature selection technique for input into a support vector machine (SVM) classifier. Classification was assessed using area under curve (AUC) of receiver operator characteristic and standard error of Wilcoxon statistic, SE(W). RESULTS: Although no genes were individually found to be associated with fatigue, 19 metabolic pathways were enriched in the high fatigue patient group using GSEA. Analysis revealed that these enrichments arose from the presence of a subset of 55 genes. A radial kernel SVM classifier with this subset of genes as input displayed significantly improved performance over classifiers using all pathway genes as input. The classifiers had AUCs of 0.866 (SE(W) 0.002) and 0.525 (SE(W) 0.006), respectively. CONCLUSIONS: Systematic analysis of gene expression data from pSS patients discordant for fatigue identified 55 genes which are predictive of fatigue level using SVM classification. This list represents the first step in understanding the underlying pathophysiological mechanisms of fatigue in patients with pSS.


Asunto(s)
Fatiga/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Síndrome de Sjögren/complicaciones , Transcriptoma , Adulto , Anciano , Área Bajo la Curva , Fatiga/sangre , Fatiga/etiología , Femenino , Predisposición Genética a la Enfermedad , Humanos , Masculino , Persona de Mediana Edad , Índice de Severidad de la Enfermedad , Síndrome de Sjögren/sangre
15.
Biosystems ; 74(1-3): 51-62, 2004.
Artículo en Inglés | MEDLINE | ID: mdl-15125992

RESUMEN

Networks of interactions evolve in many different domains. They tend to have topological characteristics in common, possibly due to common factors in the way the networks grow and develop. It has been recently suggested that one such common characteristic is the presence of a hierarchically modular organization. In this paper, we describe a new algorithm for the detection and quantification of hierarchical modularity, and demonstrate that the yeast protein-protein interaction network does have a hierarchically modular organization. We further show that such organization is evident in artificial networks produced by computational evolution using a gene duplication operator, but not in those developing via preferential attachment of new nodes to highly connected existing nodes.


Asunto(s)
Duplicación de Gen , Regulación Fúngica de la Expresión Génica/fisiología , Modelos Biológicos , Mapeo de Interacción de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/fisiología , Transducción de Señal/fisiología , Algoritmos , Simulación por Computador , Espacio Intracelular/fisiología , Proteínas de Saccharomyces cerevisiae/genética
16.
J Integr Bioinform ; 11(2): 243, 2014 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-24980693

RESUMEN

The spread of drug resistance amongst clinically-important bacteria is a serious, and growing, problem [1]. However, the analysis of entire genomes requires considerable computational effort, usually including the assembly of the genome and subsequent identification of genes known to be important in pathology. An alternative approach is to use computational algorithms to identify genomic differences between pathogenic and non-pathogenic bacteria, even without knowing the biological meaning of those differences. To overcome this problem, a range of techniques for dimensionality reduction have been developed. One such approach is known as latent-variable models [2]. In latent-variable models dimensionality reduction is achieved by representing a high-dimensional data by a few hidden or latent variables, which are not directly observed but inferred from the observed variables present in the model. Probabilistic Latent Semantic Indexing (PLSA) is an extention of LSA [3]. PLSA is based on a mixture decomposition derived from a latent class model. The main objective of the algorithm, as in LSA, is to represent high-dimensional co-occurrence information in a lower-dimensional way in order to discover the hidden semantic structure of the data using a probabilistic framework. In this work we applied the PLSA approach to analyse the common genomic features in methicillin resistant Staphylococcus aureus, using tokens derived from amino acid sequences rather than DNA. We characterised genome-scale amino acid sequences in terms of their components, and then investigated the relationships between genomes and tokens and the phenotypes they generated. As a control we used the non-pathogenic model Gram-positive bacterium Bacillus subtilis.


Asunto(s)
Bacillus subtilis/genética , Biología Computacional/métodos , Genoma Bacteriano , Genómica , Staphylococcus aureus Resistente a Meticilina/genética , Algoritmos , Reconocimiento de Normas Patrones Automatizadas , Fenotipo , Probabilidad , Semántica , Programas Informáticos
17.
J Integr Bioinform ; 11(2): 244, 2014 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-25001169

RESUMEN

As high-throughput technologies become cheaper and easier to use, raw sequence data and corresponding annotations for many organisms are becoming available. However, sequence data alone is not sufficient to explain the biological behaviour of organisms, which arises largely from complex molecular interactions. There is a need to develop new platform technologies that can be applied to the investigation of whole-genome datasets in an efficient and cost-effective manner. One such approach is the transfer of existing knowledge from well-studied organisms to closely-related organisms. In this paper, we describe a system, BacillusRegNet, for the use of a model organism, Bacillus subtilis, to infer genome-wide regulatory networks in less well-studied close relatives. The putative transcription factors, their binding sequences and predicted promoter sequences along with annotations are available from the associated BacillusRegNet website (http://bacillus.ncl.ac.uk).


Asunto(s)
Bacillus subtilis/genética , Biología Computacional/métodos , Redes Reguladoras de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Algoritmos , Bacillus subtilis/metabolismo , Sitios de Unión , Simulación por Computador , Sistemas de Computación , Bases de Datos Genéticas , Bases de Datos de Proteínas , Regulación Bacteriana de la Expresión Génica , Genoma Bacteriano , Regiones Promotoras Genéticas , Programas Informáticos , Biología de Sistemas , Factores de Transcripción/metabolismo , Transcripción Genética
18.
J Integr Bioinform ; 11(2): 242, 2014 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-24980620

RESUMEN

The rapid and cost-effective identification of bacterial species is crucial, especially for clinical diagnosis and treatment. Peptide aptamers have been shown to be valuable for use as a component of novel, direct detection methods. These small peptides have a number of advantages over antibodies, including greater specificity and longer shelf life. These properties facilitate their use as the detector components of biosensor devices. However, the identification of suitable aptamer targets for particular groups of organisms is challenging. We present a semi-automated processing pipeline for the identification of candidate aptamer targets from whole bacterial genome sequences. The pipeline can be configured to search for protein sequence fragments that uniquely identify a set of strains of interest. The system is also capable of identifying additional organisms that may be of interest due to their possession of protein fragments in common with the initial set. Through the use of Cloud computing technology and distributed databases, our system is capable of scaling with the rapidly growing genome repositories, and consequently of keeping the resulting data sets up-to-date. The system described is also more generically applicable to the discovery of specific targets for other diagnostic approaches such as DNA probes, PCR primers and antibodies.


Asunto(s)
Biología Computacional/métodos , Staphylococcus aureus Resistente a Meticilina/genética , Infecciones Estafilocócicas/microbiología , Algoritmos , Automatización , Proteínas Bacterianas/genética , Redes de Comunicación de Computadores , Sistemas de Computación , ADN/química , Epítopos/química , Genoma Bacteriano , Ligandos , Staphylococcus aureus Resistente a Meticilina/efectos de los fármacos , Péptidos/química , ARN/química , Infecciones Estafilocócicas/diagnóstico
19.
Nat Biotechnol ; 32(6): 545-50, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24911500

RESUMEN

The re-use of previously validated designs is critical to the evolution of synthetic biology from a research discipline to an engineering practice. Here we describe the Synthetic Biology Open Language (SBOL), a proposed data standard for exchanging designs within the synthetic biology community. SBOL represents synthetic biology designs in a community-driven, formalized format for exchange between software tools, research groups and commercial service providers. The SBOL Developers Group has implemented SBOL as an XML/RDF serialization and provides software libraries and specification documentation to help developers implement SBOL in their own software. We describe early successes, including a demonstration of the utility of SBOL for information exchange between several different software tools and repositories from both academic and industrial partners. As a community-driven standard, SBOL will be updated as synthetic biology evolves to provide specific capabilities for different aspects of the synthetic biology workflow.


Asunto(s)
Difusión de la Información/métodos , Proyectos de Investigación/normas , Programas Informáticos/normas , Biología Sintética/normas , Terminología como Asunto , Vocabulario Controlado , Internacionalidad , Estándares de Referencia
20.
J Integr Bioinform ; 10(2): 224, 2013 Apr 10.
Artículo en Inglés | MEDLINE | ID: mdl-23571273

RESUMEN

BacillOndex is an extension of the Ondex data integration system, providing a semantically annotated, integrated knowledge base for the model Gram-positive bacterium Bacillus subtilis. This application allows a user to mine a variety of B. subtilis data sources, and analyse the resulting integrated dataset, which contains data about genes, gene products and their interactions. The data can be analysed either manually, by browsing using Ondex, or computationally via a Web services interface. We describe the process of creating a BacillOndex instance, and describe the use of the system for the analysis of single nucleotide polymorphisms in B. subtilis Marburg. The Marburg strain is the progenitor of the widely-used laboratory strain B. subtilis 168. We identified 27 SNPs with predictable phenotypic effects, including genetic traits for known phenotypes. We conclude that BacillOndex is a valuable tool for the systems-level investigation of, and hypothesis generation about, this important biotechnology workhorse. Such understanding contributes to our ability to construct synthetic genetic circuits in this organism.


Asunto(s)
Bacillus subtilis/genética , Bases de Datos Genéticas , Biología Sintética , Biología de Sistemas , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA