Búsqueda | BVS Bolivia

1.

Large language model based framework for automated extraction of genetic interactions from unstructured data.

Gill, Jaskaran Kaur; Chetty, Madhu; Lim, Suryani; Hallinan, Jennifer.

PLoS One ; 19(5): e0303231, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38771886

RESUMEN

Extracting biological interactions from published literature helps us understand complex biological systems, accelerate research, and support decision-making in drug or treatment development. Despite efforts to automate the extraction of biological relations using text mining tools and machine learning pipelines, manual curation continues to serve as the gold standard. However, the rapidly increasing volume of literature pertaining to biological relations poses challenges in its manual curation and refinement. These challenges are further compounded because only a small fraction of the published literature is relevant to biological relation extraction, and the embedded sentences of relevant sections have complex structures, which can lead to incorrect inference of relationships. To overcome these challenges, we propose GIX, an automated and robust Gene Interaction Extraction framework, based on pre-trained Large Language models fine-tuned through extensive evaluations on various gene/protein interaction corpora including LLL and RegulonDB. GIX identifies relevant publications with minimal keywords, optimises sentence selection to reduce computational overhead, simplifies sentence structure while preserving meaning, and provides a confidence factor indicating the reliability of extracted relations. GIX's Stage-2 relation extraction method performed well on benchmark protein/gene interaction datasets, assessed using 10-fold cross-validation, surpassing state-of-the-art approaches. We demonstrated that the proposed method, although fully automated, performs as well as manual relation extraction, with enhanced robustness. We also observed GIX's capability to augment existing datasets with new sentences, incorporating newly discovered biological terms and processes. Further, we demonstrated GIX's real-world applicability in inferring E. coli gene circuits.

Asunto(s)

Minería de Datos , Minería de Datos/métodos , Procesamiento de Lenguaje Natural , Aprendizaje Automático , Biología Computacional/métodos , Humanos , Algoritmos

2.

MICFuzzy: A maximal information content based fuzzy approach for reconstructing genetic networks.

Nakulugamuwa Gamage, Hasini; Chetty, Madhu; Lim, Suryani; Hallinan, Jennifer.

PLoS One ; 18(7): e0288174, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37418430

RESUMEN

In systems biology, the accurate reconstruction of Gene Regulatory Networks (GRNs) is crucial since these networks can facilitate the solving of complex biological problems. Amongst the plethora of methods available for GRN reconstruction, information theory and fuzzy concepts-based methods have abiding popularity. However, most of these methods are not only complex, incurring a high computational burden, but they may also produce a high number of false positives, leading to inaccurate inferred networks. In this paper, we propose a novel hybrid fuzzy GRN inference model called MICFuzzy which involves the aggregation of the effects of Maximal Information Coefficient (MIC). This model has an information theory-based pre-processing stage, the output of which is applied as an input to the novel fuzzy model. In this preprocessing stage, the MIC component filters relevant genes for each target gene to significantly reduce the computational burden of the fuzzy model when selecting the regulatory genes from these filtered gene lists. The novel fuzzy model uses the regulatory effect of the identified activator-repressor gene pairs to determine target gene expression levels. This approach facilitates accurate network inference by generating a high number of true regulatory interactions while significantly reducing false regulatory predictions. The performance of MICFuzzy was evaluated using DREAM3 and DREAM4 challenge data, and the SOS real gene expression dataset. MICFuzzy outperformed the other state-of-the-art methods in terms of F-score, Matthews Correlation Coefficient, Structural Accuracy, and SS_mean, and outperformed most of them in terms of efficiency. MICFuzzy also had improved efficiency compared with the classical fuzzy model since the design of MICFuzzy leads to a reduction in combinatorial computation.

Asunto(s)

Algoritmos , Biología Computacional , Biología Computacional/métodos , Redes Reguladoras de Genes , Biología de Sistemas , Genes Reguladores , Modelos Genéticos

3.

Computational intelligence and machine learning in bioinformatics and computational biology.

Chetty, Madhu; Hallinan, Jennifer; Ruz, Gonzalo A; Wipat, Anil.

Biosystems ; 222: 104792, 2022 12.

Artículo en Inglés | MEDLINE | ID: mdl-36209915

Asunto(s)

Inteligencia Artificial , Biología Computacional , Aprendizaje Automático

4.

Filter feature selection based Boolean Modelling for Genetic Network Inference.

Gamage, Hasini Nakulugamuwa; Chetty, Madhu; Shatte, Adrian; Hallinan, Jennifer.

Biosystems ; 221: 104757, 2022 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-36007675

RESUMEN

The reconstruction of Gene Regulatory Networks (GRNs) from time series gene expression data is highly relevant for the discovery of complex biological interactions and dynamics. Various computational strategies have been developed for this task, but most approaches have low computational efficiency and are not able to cope with high-dimensional, low sample-number, gene expression data. In this paper, we introduce a novel combined filter feature selection approach for efficient and accurate inference of GRNs. A Boolean framework for network modelling is used to demonstrate the efficacy of the proposed approach. Using discretized microarray expression data, the genes most relevant to each target gene are first filtered using ReliefF, an instance-based feature ranking method that is here applied for the first time to GRN inference. Then, further gene selection from the filtered-gene list is done using a mutual information-based min-redundancy max-relevance criterion by eliminating irrelevant genes. This combined method is executed on resampled datasets to finalize the optimal set of regulatory genes. Building upon our previous research, a Pearson correlation coefficient-based Boolean modelling approach is utilized for the efficient identification of the optimal regulatory rules associated with selected regulatory genes. The proposed approach was evaluated using gene expression datasets from small-scale and medium-scale real gene networks, and was observed to be more effective than Linear Discriminant Analysis, performed better than the individual feature selection methods, and obtained improved Structural Accuracy with a higher number of true positives than other state-of-the-art methods, while outperforming these methods with respect to Dynamic Accuracy and efficiency.

Asunto(s)

Algoritmos , Redes Reguladoras de Genes , Biología Computacional/métodos , Redes Reguladoras de Genes/genética , Factores de Tiempo

5.

Combining kinetic orders for efficient S-System modelling of gene regulatory network.

Gill, Jaskaran; Chetty, Madhu; Shatte, Adrian; Hallinan, Jennifer.

Biosystems ; 220: 104736, 2022 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-35863700

RESUMEN

S-System models, non-linear differential equation models, are widely used for reconstructing gene regulatory networks from temporal gene expression data. An S-System model involves two states, generation and degeneration, and uses the kinetic parameters gij and hij, to represent the direction, nature, and intensity of the genetic interactions. The need for learning a large number of model parameters results in increased computational expense. Previously, we improved the performance of the algorithm using dynamic allocation of the maximum in-degree for each gene. While the method was effective for smaller networks, a large amount of computation was still needed for larger networks. This problem arose mainly due to the increased occurrence of invalid networks during optimization, primarily because the two kinetic parameters (gij and hij) of the S-System model converge independently during optimization. Being independent, these two parameters can converge to values that can indicate contradictory gene interactions, specifically inhibition or activation. In this study, to address this major challenge in S-System modelling, we developed a novel method that includes two features: a penalty term that penalizes those networks with invalid kinetic orders, and a parameter, wij, derived by combining the kinetic parameters gij and hij. The novel penalty term was used for candidate selection during the process of optimizing the DRNI (Dynamically Regulated Network Initialization) algorithm. Rather than remaining constant, it is dynamic, with its magnitude dependent on the number of invalid interactions in the given network. This approach encourages the generation of valid candidate solutions, and eliminates invalid networks in a systematic manner. The previous DRNI method, a two-stage approach which uses dynamic allocation of the maximum in-degree for each gene, was further improved by adding a third stage which applies the proposed wij to handle the invalid regulations that may still exist in that candidate solutions. The method was tested on different gene expression datasets, and was able to reduce the number of iterations and produce improved network accuracies. For a 20 gene network, the number of generations required for convergence was reduced by 300, and the F-score improved by 0.05 compared to our previously reported DRNI approach. For the well-known 10 gene networks of the DREAM challenge, our method produced an improvement in the average area under the ROC curve of the DREAM4 10 gene networks.

Asunto(s)

Biología Computacional , Redes Reguladoras de Genes , Algoritmos , Biología Computacional/métodos , Redes Reguladoras de Genes/genética , Cinética

6.

Modelling the fitness landscapes of a SCRaMbLEd yeast genome.

Yang, Bill; Misirli, Goksel; Wipat, Anil; Hallinan, Jennifer.

Biosystems ; 219: 104730, 2022 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-35772570

RESUMEN

The use of microorganisms for the production of industrially important compounds and enzymes is becoming increasingly important. Eukaryotes have been less widely used than prokaryotes in biotechnology, because of the complexity of their genomic structure and biology. The Yeast2.0 project is an international effort to engineer the yeast Saccharomyces cerevisiae to make it easy to manipulate, and to generate random variants using a system called SCRaMbLE. SCRaMbLE relies on artificial evolution in vitro to identify useful variants, an approach which is time consuming and expensive. We developed an in silico simulator for the SCRaMbLE system, using an evolutionary computing approach, which can be used to investigate and optimize the fitness landscape of the system. We applied the system to the investigation of the fitness landscape of one of the S. saccharomyces chromosomes, and found that our results fitted well with those previously published. We then simulated directed evolution with or without manipulation of SCRaMbLE, and revealed that controlling the SCRaMbLE process could effectively impact directed evolution. Our simulator can be applied to the analysis of the fitness landscapes of any organism for which SCRaMbLE has been implemented.

Asunto(s)

Genoma Fúngico , Saccharomyces cerevisiae , Cromosomas , Aptitud Genética/genética , Genoma Fúngico/genética , Genómica , Saccharomyces cerevisiae/genética

7.

B-cell activity markers are associated with different disease activity domains in primary Sjögren's syndrome.

James, Katherine; Chipeta, Chimwemwe; Parker, Antony; Harding, Stephen; Cockell, Simon J; Gillespie, Colin S; Hallinan, Jennifer; Barone, Francesca; Bowman, Simon J; Ng, Wan-Fai; Fisher, Benjamin A.

Rheumatology (Oxford) ; 57(7): 1222-1227, 2018 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-29608774

RESUMEN

OBJECTIVES: B-cell activating factor (BAFF), ß-2 microglobulin (ß2M) and serum free light chains (FLCs) are elevated in primary SS (pSS) and associated with disease activity. We aimed to investigate their association with the individual disease activity domains of the EULAR Sjögren's Syndrome Disease Activity Index (ESSDAI) in a large well-characterized pSS cohort. METHODS: Sera from pSS patients enrolled in the UK Primary Sjögren's Syndrome Registry (UKPSSR) (n = 553) and healthy controls (n = 286) were analysed for FLC (κ and λ), BAFF and ß2 M. Pearson correlation coefficients were calculated for patient clinical characteristics, including salivary flow, Schirmer's test, EULAR Sjögren's Syndrome Patient Reported Index and serum IgG levels. Poisson regression was performed to identify independent predictors of total ESSDAI and ClinESSDAI (validated ESSDAI minus the biological domain) scores and their domains. RESULTS: Levels of BAFF, ß2M and FLCs were higher in pSS patients compared to controls. All three biomarkers associated significantly with the ESSDAI and the ClinESSDAI. BAFF associated with the peripheral nervous system domain of the ESSDAI, whereas ß2M and FLCs associated with the cutaneous, biological and renal domains. Multivariate analysis showed BAFF, ß2M and their interaction to be independent predictors of ESSDAI/ClinESSDAI. FLCs were also shown to associate with the ESSDAI/ClinESSDAI but not independent of serum IgG. CONCLUSION: All biomarkers were associated with total ESSDAI scores but with differing domain associations. These findings should encourage further investigation of these biomarkers in longitudinal studies and against other disease activity measures.

8.

The SBOL Stack: A Platform for Storing, Publishing, and Sharing Synthetic Biology Designs.

Madsen, Curtis; McLaughlin, James Alastair; Misirli, Göksel; Pocock, Matthew; Flanagan, Keith; Hallinan, Jennifer; Wipat, Anil.

ACS Synth Biol ; 5(6): 487-97, 2016 06 17.

Artículo en Inglés | MEDLINE | ID: mdl-27268205

RESUMEN

Recently, synthetic biologists have developed the Synthetic Biology Open Language (SBOL), a data exchange standard for descriptions of genetic parts, devices, modules, and systems. The goals of this standard are to allow scientists to exchange designs of biological parts and systems, to facilitate the storage of genetic designs in repositories, and to facilitate the description of genetic designs in publications. In order to achieve these goals, the development of an infrastructure to store, retrieve, and exchange SBOL data is necessary. To address this problem, we have developed the SBOL Stack, a Resource Description Framework (RDF) database specifically designed for the storage, integration, and publication of SBOL data. This database allows users to define a library of synthetic parts and designs as a service, to share SBOL data with collaborators, and to store designs of biological systems locally. The database also allows external data sources to be integrated by mapping them to the SBOL data model. The SBOL Stack includes two Web interfaces: the SBOL Stack API and SynBioHub. While the former is designed for developers, the latter allows users to upload new SBOL biological designs, download SBOL documents, search by keyword, and visualize SBOL data. Since the SBOL Stack is based on semantic Web technology, the inherent distributed querying functionality of RDF databases can be used to allow different SBOL stack databases to be queried simultaneously, and therefore, data can be shared between different institutes, centers, or other users.

Asunto(s)

Sistemas de Administración de Bases de Datos , Lenguajes de Programación , Biología Sintética , Bases de Datos Factuales , Edición

9.

Data Integration and Mining for Synthetic Biology Design.

Misirli, Göksel; Hallinan, Jennifer; Pocock, Matthew; Lord, Phillip; McLaughlin, James Alastair; Sauro, Herbert; Wipat, Anil.

ACS Synth Biol ; 5(10): 1086-1097, 2016 10 21.

Artículo en Inglés | MEDLINE | ID: mdl-27110921

RESUMEN

One aim of synthetic biologists is to create novel and predictable biological systems from simpler modular parts. This approach is currently hampered by a lack of well-defined and characterized parts and devices. However, there is a wealth of existing biological information, which can be used to identify and characterize biological parts, and their design constraints in the literature and numerous biological databases. However, this information is spread among these databases in many different formats. New computational approaches are required to make this information available in an integrated format that is more amenable to data mining. A tried and tested approach to this problem is to map disparate data sources into a single data set, with common syntax and semantics, to produce a data warehouse or knowledge base. Ontologies have been used extensively in the life sciences, providing this common syntax and semantics as a model for a given biological domain, in a fashion that is amenable to computational analysis and reasoning. Here, we present an ontology for applications in synthetic biology design, SyBiOnt, which facilitates the modeling of information about biological parts and their relationships. SyBiOnt was used to create the SyBiOntKB knowledge base, incorporating and building upon existing life sciences ontologies and standards. The reasoning capabilities of ontologies were then applied to automate the mining of biological parts from this knowledge base. We propose that this approach will be useful to speed up synthetic biology design and ultimately help facilitate the automation of the biological engineering life cycle.

Asunto(s)

Minería de Datos , Bases de Datos Genéticas , Biología Sintética , Bacillus subtilis/genética , Bacillus subtilis/metabolismo , Biología Computacional , ADN Bacteriano/genética , Bases del Conocimiento , Regiones Promotoras Genéticas , Análisis de Secuencia de ADN , Programas Informáticos

10.

A Transcriptional Signature of Fatigue Derived from Patients with Primary Sjögren's Syndrome.

James, Katherine; Al-Ali, Shereen; Tarn, Jessica; Cockell, Simon J; Gillespie, Colin S; Hindmarsh, Victoria; Locke, James; Mitchell, Sheryl; Lendrem, Dennis; Bowman, Simon; Price, Elizabeth; Pease, Colin T; Emery, Paul; Lanyon, Peter; Hunter, John A; Gupta, Monica; Bombardieri, Michele; Sutcliffe, Nurhan; Pitzalis, Costantino; McLaren, John; Cooper, Annie; Regan, Marian; Giles, Ian; Isenberg, David; Saravanan, Vadivelu; Coady, David; Dasgupta, Bhaskar; McHugh, Neil; Young-Min, Steven; Moots, Robert; Gendi, Nagui; Akil, Mohammed; Griffiths, Bridget; Wipat, Anil; Newton, Julia; Jones, David E; Isaacs, John; Hallinan, Jennifer; Ng, Wan-Fai.

PLoS One ; 10(12): e0143970, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26694930

RESUMEN

BACKGROUND: Fatigue is a debilitating condition with a significant impact on patients' quality of life. Fatigue is frequently reported by patients suffering from primary Sjögren's Syndrome (pSS), a chronic autoimmune condition characterised by dryness of the eyes and the mouth. However, although fatigue is common in pSS, it does not manifest in all sufferers, providing an excellent model with which to explore the potential underpinning biological mechanisms. METHODS: Whole blood samples from 133 fully-phenotyped pSS patients stratified for the presence of fatigue, collected by the UK primary Sjögren's Syndrome Registry, were used for whole genome microarray. The resulting data were analysed both on a gene by gene basis and using pre-defined groups of genes. Finally, gene set enrichment analysis (GSEA) was used as a feature selection technique for input into a support vector machine (SVM) classifier. Classification was assessed using area under curve (AUC) of receiver operator characteristic and standard error of Wilcoxon statistic, SE(W). RESULTS: Although no genes were individually found to be associated with fatigue, 19 metabolic pathways were enriched in the high fatigue patient group using GSEA. Analysis revealed that these enrichments arose from the presence of a subset of 55 genes. A radial kernel SVM classifier with this subset of genes as input displayed significantly improved performance over classifiers using all pathway genes as input. The classifiers had AUCs of 0.866 (SE(W) 0.002) and 0.525 (SE(W) 0.006), respectively. CONCLUSIONS: Systematic analysis of gene expression data from pSS patients discordant for fatigue identified 55 genes which are predictive of fatigue level using SVM classification. This list represents the first step in understanding the underlying pathophysiological mechanisms of fatigue in patients with pSS.

Asunto(s)

Fatiga/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Síndrome de Sjögren/complicaciones , Transcriptoma , Adulto , Anciano , Área Bajo la Curva , Fatiga/sangre , Fatiga/etiología , Femenino , Predisposición Genética a la Enfermedad , Humanos , Masculino , Persona de Mediana Edad , Índice de Severidad de la Enfermedad , Síndrome de Sjögren/sangre

11.

A distributed computational search strategy for the identification of diagnostics targets: application to finding aptamer targets for methicillin-resistant staphylococci.

Flanagan, Keith; Cockell, Simon; Harwood, Colin; Hallinan, Jennifer; Nakjang, Sirintra; Lawry, Beth; Wipat, Anil.

J Integr Bioinform ; 11(2): 242, 2014 Jun 30.

Artículo en Inglés | MEDLINE | ID: mdl-24980620

RESUMEN

The rapid and cost-effective identification of bacterial species is crucial, especially for clinical diagnosis and treatment. Peptide aptamers have been shown to be valuable for use as a component of novel, direct detection methods. These small peptides have a number of advantages over antibodies, including greater specificity and longer shelf life. These properties facilitate their use as the detector components of biosensor devices. However, the identification of suitable aptamer targets for particular groups of organisms is challenging. We present a semi-automated processing pipeline for the identification of candidate aptamer targets from whole bacterial genome sequences. The pipeline can be configured to search for protein sequence fragments that uniquely identify a set of strains of interest. The system is also capable of identifying additional organisms that may be of interest due to their possession of protein fragments in common with the initial set. Through the use of Cloud computing technology and distributed databases, our system is capable of scaling with the rapidly growing genome repositories, and consequently of keeping the resulting data sets up-to-date. The system described is also more generically applicable to the discovery of specific targets for other diagnostic approaches such as DNA probes, PCR primers and antibodies.

Asunto(s)

Biología Computacional/métodos , Staphylococcus aureus Resistente a Meticilina/genética , Infecciones Estafilocócicas/microbiología , Algoritmos , Automatización , Proteínas Bacterianas/genética , Redes de Comunicación de Computadores , Sistemas de Computación , ADN/química , Epítopos/química , Genoma Bacteriano , Ligandos , Staphylococcus aureus Resistente a Meticilina/efectos de los fármacos , Péptidos/química , ARN/química , Infecciones Estafilocócicas/diagnóstico

12.

Probabilistic latent semantic analysis applied to whole bacterial genomes identifies common genomic features.

Rusakovica, Julija; Hallinan, Jennifer; Wipat, Anil; Zuliani, Paolo.

J Integr Bioinform ; 11(2): 243, 2014 Jun 30.

Artículo en Inglés | MEDLINE | ID: mdl-24980693

RESUMEN

The spread of drug resistance amongst clinically-important bacteria is a serious, and growing, problem [1]. However, the analysis of entire genomes requires considerable computational effort, usually including the assembly of the genome and subsequent identification of genes known to be important in pathology. An alternative approach is to use computational algorithms to identify genomic differences between pathogenic and non-pathogenic bacteria, even without knowing the biological meaning of those differences. To overcome this problem, a range of techniques for dimensionality reduction have been developed. One such approach is known as latent-variable models [2]. In latent-variable models dimensionality reduction is achieved by representing a high-dimensional data by a few hidden or latent variables, which are not directly observed but inferred from the observed variables present in the model. Probabilistic Latent Semantic Indexing (PLSA) is an extention of LSA [3]. PLSA is based on a mixture decomposition derived from a latent class model. The main objective of the algorithm, as in LSA, is to represent high-dimensional co-occurrence information in a lower-dimensional way in order to discover the hidden semantic structure of the data using a probabilistic framework. In this work we applied the PLSA approach to analyse the common genomic features in methicillin resistant Staphylococcus aureus, using tokens derived from amino acid sequences rather than DNA. We characterised genome-scale amino acid sequences in terms of their components, and then investigated the relationships between genomes and tokens and the phenotypes they generated. As a control we used the non-pathogenic model Gram-positive bacterium Bacillus subtilis.

Asunto(s)

Bacillus subtilis/genética , Biología Computacional/métodos , Genoma Bacteriano , Genómica , Staphylococcus aureus Resistente a Meticilina/genética , Algoritmos , Reconocimiento de Normas Patrones Automatizadas , Fenotipo , Probabilidad , Semántica , Programas Informáticos

13.

BacillusRegNet: a transcriptional regulation database and analysis platform for Bacillus species.

Misirli, Goksel; Hallinan, Jennifer; Röttger, Richard; Baumbach, Jan; Wipat, Anil.

J Integr Bioinform ; 11(2): 244, 2014 Jul 08.

Artículo en Inglés | MEDLINE | ID: mdl-25001169

RESUMEN

As high-throughput technologies become cheaper and easier to use, raw sequence data and corresponding annotations for many organisms are becoming available. However, sequence data alone is not sufficient to explain the biological behaviour of organisms, which arises largely from complex molecular interactions. There is a need to develop new platform technologies that can be applied to the investigation of whole-genome datasets in an efficient and cost-effective manner. One such approach is the transfer of existing knowledge from well-studied organisms to closely-related organisms. In this paper, we describe a system, BacillusRegNet, for the use of a model organism, Bacillus subtilis, to infer genome-wide regulatory networks in less well-studied close relatives. The putative transcription factors, their binding sequences and predicted promoter sequences along with annotations are available from the associated BacillusRegNet website (http://bacillus.ncl.ac.uk).

Asunto(s)

Bacillus subtilis/genética , Biología Computacional/métodos , Redes Reguladoras de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Algoritmos , Bacillus subtilis/metabolismo , Sitios de Unión , Simulación por Computador , Sistemas de Computación , Bases de Datos Genéticas , Bases de Datos de Proteínas , Regulación Bacteriana de la Expresión Génica , Genoma Bacteriano , Regiones Promotoras Genéticas , Programas Informáticos , Biología de Sistemas , Factores de Transcripción/metabolismo , Transcripción Genética

14.

The Synthetic Biology Open Language (SBOL) provides a community standard for communicating designs in synthetic biology.

Galdzicki, Michal; Clancy, Kevin P; Oberortner, Ernst; Pocock, Matthew; Quinn, Jacqueline Y; Rodriguez, Cesar A; Roehner, Nicholas; Wilson, Mandy L; Adam, Laura; Anderson, J Christopher; Bartley, Bryan A; Beal, Jacob; Chandran, Deepak; Chen, Joanna; Densmore, Douglas; Endy, Drew; Grünberg, Raik; Hallinan, Jennifer; Hillson, Nathan J; Johnson, Jeffrey D; Kuchinsky, Allan; Lux, Matthew; Misirli, Goksel; Peccoud, Jean; Plahar, Hector A; Sirin, Evren; Stan, Guy-Bart; Villalobos, Alan; Wipat, Anil; Gennari, John H; Myers, Chris J; Sauro, Herbert M.

Nat Biotechnol ; 32(6): 545-50, 2014 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-24911500

RESUMEN

The re-use of previously validated designs is critical to the evolution of synthetic biology from a research discipline to an engineering practice. Here we describe the Synthetic Biology Open Language (SBOL), a proposed data standard for exchanging designs within the synthetic biology community. SBOL represents synthetic biology designs in a community-driven, formalized format for exchange between software tools, research groups and commercial service providers. The SBOL Developers Group has implemented SBOL as an XML/RDF serialization and provides software libraries and specification documentation to help developers implement SBOL in their own software. We describe early successes, including a demonstration of the utility of SBOL for information exchange between several different software tools and repositories from both academic and industrial partners. As a community-driven standard, SBOL will be updated as synthetic biology evolves to provide specific capabilities for different aspects of the synthetic biology workflow.

Asunto(s)

Difusión de la Información/métodos , Proyectos de Investigación/normas , Programas Informáticos/normas , Biología Sintética/normas , Terminología como Asunto , Vocabulario Controlado , Internacionalidad , Estándares de Referencia

15.

BacillOndex: an integrated data resource for systems and synthetic biology.

Misirli, Goksel; Wipat, Anil; Mullen, Joseph; James, Katherine; Pocock, Matthew; Smith, Wendy; Allenby, Nick; Hallinan, Jennifer S.

J Integr Bioinform ; 10(2): 224, 2013 Apr 10.

Artículo en Inglés | MEDLINE | ID: mdl-23571273

RESUMEN

BacillOndex is an extension of the Ondex data integration system, providing a semantically annotated, integrated knowledge base for the model Gram-positive bacterium Bacillus subtilis. This application allows a user to mine a variety of B. subtilis data sources, and analyse the resulting integrated dataset, which contains data about genes, gene products and their interactions. The data can be analysed either manually, by browsing using Ondex, or computationally via a Web services interface. We describe the process of creating a BacillOndex instance, and describe the use of the system for the analysis of single nucleotide polymorphisms in B. subtilis Marburg. The Marburg strain is the progenitor of the widely-used laboratory strain B. subtilis 168. We identified 27 SNPs with predictable phenotypic effects, including genetic traits for known phenotypes. We conclude that BacillOndex is a valuable tool for the systems-level investigation of, and hypothesis generation about, this important biotechnology workhorse. Such understanding contributes to our ability to construct synthetic genetic circuits in this organism.

Asunto(s)

Bacillus subtilis/genética , Bases de Datos Genéticas , Biología Sintética , Biología de Sistemas , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos

16.

Microbase2.0: a generic framework for computationally intensive bioinformatics workflows in the cloud.

Flanagan, Keith; Nakjang, Sirintra; Hallinan, Jennifer; Harwood, Colin; Hirt, Robert P; Pocock, Matthew R; Wipat, Anil.

J Integr Bioinform ; 9(2): 212, 2012 Sep 24.

Artículo en Inglés | MEDLINE | ID: mdl-23001322

RESUMEN

As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years' worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.

Asunto(s)

Biología Computacional/métodos , Programas Informáticos , Bases de Datos Factuales , Interfaz Usuario-Computador , Flujo de Trabajo

17.

Is newer better?--evaluating the effects of data curation on integrated analyses in Saccharomyces cerevisiae.

James, Katherine; Wipat, Anil; Hallinan, Jennifer.

Integr Biol (Camb) ; 4(7): 715-27, 2012 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-22526920

RESUMEN

Recent high-throughput experiments have produced a wealth of heterogeneous datasets, each of which provides information about different aspects of the cell. Consequently, integration of diverse data types is essential in order to address many biological questions. The quality of any integrated analysis system is dependent upon the quality of its component data, and upon the Gold Standard data used to evaluate it. It is commonly assumed that the quality of data improves as databases grow and change, particularly for manually curated databases. However, the validity of this assumption can be questioned, given the constant changes in the data coupled with the high level of noise associated with high-throughput experimental techniques. One of the most powerful approaches to data integration is the use of Probabilistic Functional Integrated Networks (PFINs). Here, we systematically analyse the changes in four highly-curated and widely-used online databases and evaluate the extent to which these changes affect the protein function prediction performance of PFINs in the yeast Saccharomyces cerevisiae. We find that the global trend in network performance improves over time. Where individual areas of biology are concerned, however, the most recent files do not always produce the best results. Individual datasets have unique biases towards different biological processes and by selecting and integrating relevant datasets performance can be improved. When using any type of integrated system to answer a specific biological question careful selection of raw data and Gold Standard is vital, since the most recent data may not be the most appropriate.

Asunto(s)

Biología Computacional/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/fisiología , Algoritmos , Área Bajo la Curva , Reparación del ADN , Interpretación Estadística de Datos , Bases de Datos de Proteínas , Reacciones Falso Positivas , Modelos Estadísticos , Mapeo de Interacción de Proteínas/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad

18.

Bayesian integration of networks without gold standards.

Weile, Jochen; James, Katherine; Hallinan, Jennifer; Cockell, Simon J; Lord, Phillip; Wipat, Anil; Wilkinson, Darren J.

Bioinformatics ; 28(11): 1495-500, 2012 Jun 01.

Artículo en Inglés | MEDLINE | ID: mdl-22492647

RESUMEN

MOTIVATION: Biological experiments give insight into networks of processes inside a cell, but are subject to error and uncertainty. However, due to the overlap between the large number of experiments reported in public databases it is possible to assess the chances of individual observations being correct. In order to do so, existing methods rely on high-quality 'gold standard' reference networks, but such reference networks are not always available. RESULTS: We present a novel algorithm for computing the probability of network interactions that operates without gold standard reference data. We show that our algorithm outperforms existing gold standard-based methods. Finally, we apply the new algorithm to a large collection of genetic interaction and protein-protein interaction experiments. AVAILABILITY: The integrated dataset and a reference implementation of the algorithm as a plug-in for the Ondex data integration framework are available for download at http://bio-nexus.ncl.ac.uk/projects/nogold/

Asunto(s)

Algoritmos , Teorema de Bayes , Epistasis Genética , Mapeo de Interacción de Proteínas/normas , Funciones de Verosimilitud , Mapeo de Interacción de Proteínas/métodos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo

19.

Keratinocyte apoptosis in epidermal remodeling and clearance of psoriasis induced by UV radiation.

Weatherhead, Sophie C; Farr, Peter M; Jamieson, David; Hallinan, Jennifer S; Lloyd, James J; Wipat, Anil; Reynolds, Nick J.

J Invest Dermatol ; 131(9): 1916-26, 2011 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-21614017

RESUMEN

Psoriasis is a common chronic skin disorder, but the mechanisms involved in the resolution and clearance of plaques remain poorly defined. We investigated the mechanism of action of UVB, which is highly effective in clearing psoriasis and inducing remission, and tested the hypothesis that apoptosis is a key mechanism. To distinguish bystander effects, equal erythemal doses of two UVB wavelengths were compared following in vivo irradiation of psoriatic plaques; one is clinically effective (311 nm) and one has no therapeutic effect on psoriasis (290 nm). Only 311 nm UVB induced significant apoptosis in lesional epidermis, and most apoptotic cells were keratinocytes. To determine clinical relevance, we created a computational model of psoriatic epidermis. Modeling predicted apoptosis would occur in both stem and transit-amplifying cells to account for plaque clearance; this was confirmed and quantified experimentally. The median rate of keratinocyte apoptosis from onset to cell death was 20 minutes. These data were fed back into the model and demonstrated that the observed level of keratinocyte apoptosis was sufficient to explain UVB-induced plaque resolution. Our human studies combined with a systems biology approach demonstrate that keratinocyte apoptosis is a key mechanism in psoriatic plaques clearance, providing the basis for future molecular investigation and therapeutic development.

Asunto(s)

Apoptosis/efectos de la radiación , Queratinocitos/efectos de la radiación , Psoriasis/patología , Psoriasis/radioterapia , Terapia Ultravioleta/métodos , Adulto , Biopsia , División Celular/efectos de la radiación , Células Cultivadas , Dermis/patología , Dermis/efectos de la radiación , Relación Dosis-Respuesta en la Radiación , Epidermis/patología , Epidermis/efectos de la radiación , Femenino , Humanos , Queratinocitos/citología , Queratinocitos/patología , Masculino , Modelos Biológicos , Resultado del Tratamiento

20.

Customizable views on semantically integrated networks for systems biology.

Weile, Jochen; Pocock, Matthew; Cockell, Simon J; Lord, Phillip; Dewar, James M; Holstein, Eva-Maria; Wilkinson, Darren; Lydall, David; Hallinan, Jennifer; Wipat, Anil.

Bioinformatics ; 27(9): 1299-306, 2011 May 01.

Artículo en Inglés | MEDLINE | ID: mdl-21414991

RESUMEN

MOTIVATION: The rise of high-throughput technologies in the post-genomic era has led to the production of large amounts of biological data. Many of these datasets are freely available on the Internet. Making optimal use of these data is a significant challenge for bioinformaticians. Various strategies for integrating data have been proposed to address this challenge. One of the most promising approaches is the development of semantically rich integrated datasets. Although well suited to computational manipulation, such integrated datasets are typically too large and complex for easy visualization and interactive exploration. RESULTS: We have created an integrated dataset for Saccharomyces cerevisiae using the semantic data integration tool Ondex, and have developed a view-based visualization technique that allows for concise graphical representations of the integrated data. The technique was implemented in a plug-in for Cytoscape, called OndexView. We used OndexView to investigate telomere maintenance in S. cerevisiae. AVAILABILITY: The Ondex yeast dataset and the OndexView plug-in for Cytoscape are accessible at http://bsu.ncl.ac.uk/ondexview.

Asunto(s)

Biología Computacional/métodos , Bases de Datos Genéticas , Almacenamiento y Recuperación de la Información/métodos , Biología de Sistemas/métodos , Internet , Saccharomyces cerevisiae/genética , Telómero/genética

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA