Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
Nucleic Acids Res ; 52(D1): D92-D97, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37956313

RESUMO

The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena) is maintained by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI). The ENA is one of the three members of the International Nucleotide Sequence Database Collaboration (INSDC). It serves the bioinformatics community worldwide via the submission, processing, archiving and dissemination of sequence data. The ENA supports data types ranging from raw reads, through alignments and assemblies to functional annotation. The data is enriched with contextual information relating to samples and experimental configurations. In this article, we describe recent progress and improvements to ENA services. In particular, we focus upon three areas of work in 2023: FAIRness of ENA data, pandemic preparedness and foundational technology. For FAIRness, we have introduced minimal requirements for spatiotemporal annotation, created a metadata-based classification system, incorporated third party metadata curations with archived records, and developed a new rapid visualisation platform, the ENA Notebooks. For foundational enhancements, we have improved the INSDC data exchange and synchronisation pipelines, and invested in site reliability engineering for ENA infrastructure. In order to support genomic surveillance efforts, we have continued to provide ENA services in support of SARS-CoV-2 data mobilisation and have adapted these for broader pathogen surveillance efforts.


Assuntos
Genômica , Nucleotídeos , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Internet , Reprodutibilidade dos Testes , Europa (Continente)
2.
Nucleic Acids Res ; 51(D1): D121-D125, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36399492

RESUMO

The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena), maintained by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), offers those producing data an open and supported platform for the management, archiving, publication, and dissemination of data; and to the scientific community as a whole, it offers a globally comprehensive data set through a host of data discovery and retrieval tools. Here, we describe recent updates to the ENA's submission and retrieval services as well as focused efforts to improve connectivity, reusability, and interoperability of ENA data and metadata.


Assuntos
Bases de Dados de Ácidos Nucleicos , Academias e Institutos , Biologia Computacional , Internet , Software , Conjuntos de Dados como Assunto
3.
Toxicol Appl Pharmacol ; 270(2): 149-57, 2013 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-23602889

RESUMO

Improving drug attrition remains a challenge in pharmaceutical discovery and development. A major cause of early attrition is the demonstration of safety signals which can negate any therapeutic index previously established. Safety attrition needs to be put in context of clinical translation (i.e. human relevance) and is negatively impacted by differences between animal models and human. In order to minimize such an impact, an earlier assessment of pharmacological target homology across animal model species will enhance understanding of the context of animal safety signals and aid species selection during later regulatory toxicology studies. Here we sequenced the genomes of the Sus scrofa Göttingen minipig and the Canis familiaris beagle, two widely used animal species in regulatory safety studies. Comparative analyses of these new genomes with other key model organisms, namely mouse, rat, cynomolgus macaque, rhesus macaque, two related breeds (S. scrofa Duroc and C. familiaris boxer) and human reveal considerable variation in gene content. Key genes in toxicology and metabolism studies, such as the UGT2 family, CYP2D6, and SLCO1A2, displayed unique duplication patterns. Comparisons of 317 known human drug targets revealed surprising variation such as species-specific positive selection, duplication and higher occurrences of pseudogenized targets in beagle (41 genes) relative to minipig (19 genes). These data will facilitate the more effective use of animals in biomedical research.


Assuntos
Cães/genética , Descoberta de Drogas/métodos , Genoma , Modelos Animais , Porco Miniatura/genética , Animais , Sequência de Bases , Feminino , Dados de Sequência Molecular , Alinhamento de Sequência , Análise de Sequência de DNA , Suínos
4.
Mol Cell Proteomics ; 9(1): 1-10, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19674966

RESUMO

Protein affinity reagents (PARs), most commonly antibodies, are essential reagents for protein characterization in basic research, biotechnology, and diagnostics as well as the fastest growing class of therapeutics. Large numbers of PARs are available commercially; however, their quality is often uncertain. In addition, currently available PARs cover only a fraction of the human proteome, and their cost is prohibitive for proteome scale applications. This situation has triggered several initiatives involving large scale generation and validation of antibodies, for example the Swedish Human Protein Atlas and the German Antibody Factory. Antibodies targeting specific subproteomes are being pursued by members of Human Proteome Organisation (plasma and liver proteome projects) and the United States National Cancer Institute (cancer-associated antigens). ProteomeBinders, a European consortium, aims to set up a resource of consistently quality-controlled protein-binding reagents for the whole human proteome. An ultimate PAR database resource would allow consumers to visit one on-line warehouse and find all available affinity reagents from different providers together with documentation that facilitates easy comparison of their cost and quality. However, in contrast to, for example, nucleotide databases among which data are synchronized between the major data providers, current PAR producers, quality control centers, and commercial companies all use incompatible formats, hindering data exchange. Here we propose Proteomics Standards Initiative (PSI)-PAR as a global community standard format for the representation and exchange of protein affinity reagent data. The PSI-PAR format is maintained by the Human Proteome Organisation PSI and was developed within the context of ProteomeBinders by building on a mature proteomics standard format, PSI-molecular interaction, which is a widely accepted and established community standard for molecular interaction data. Further information and documentation are available on the PSI-PAR web site.


Assuntos
Bases de Dados de Proteínas/normas , Proteoma/análise , Sistemas de Gerenciamento de Base de Dados/normas , Humanos , Cooperação Internacional , Proteômica/métodos , Terminologia como Assunto
5.
Nat Biotechnol ; 25(8): 894-8, 2007 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-17687370

RESUMO

A wealth of molecular interaction data is available in the literature, ranging from large-scale datasets to a single interaction confirmed by several different techniques. These data are all too often reported either as free text or in tables of variable format, and are often missing key pieces of information essential for a full understanding of the experiment. Here we propose MIMIx, the minimum information required for reporting a molecular interaction experiment. Adherence to these reporting guidelines will result in publications of increased clarity and usefulness to the scientific community and will support the rapid, systematic capture of molecular interaction data in public databases, thereby improving access to valuable interaction data.


Assuntos
Bases de Dados de Proteínas/normas , Guias como Assunto , Armazenamento e Recuperação da Informação/normas , Mapeamento de Interação de Proteínas/normas , Proteômica/normas , Pesquisa/normas , Humanos , Internacionalidade
6.
Drug Discov Today ; 24(10): 2068-2075, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31158512

RESUMO

In this review, we provide a summary of recent progress in ontology mapping (OM) at a crucial time when biomedical research is under a deluge of an increasing amount and variety of data. This is particularly important for realising the full potential of semantically enabled or enriched applications and for meaningful insights, such as drug discovery, using machine-learning technologies. We discuss challenges and solutions for better ontology mappings, as well as how to select ontologies before their application. In addition, we describe tools and algorithms for ontology mapping, including evaluation of tool capability and quality of mappings. Finally, we outline the requirements for an ontology mapping service (OMS) and the progress being made towards implementation of such sustainable services.


Assuntos
Ontologias Biológicas , Descoberta de Drogas/métodos , Aprendizado de Máquina , Semântica , Algoritmos , Humanos
7.
BMC Biol ; 5: 44, 2007 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-17925023

RESUMO

BACKGROUND: Molecular interaction Information is a key resource in modern biomedical research. Publicly available data have previously been provided in a broad array of diverse formats, making access to this very difficult. The publication and wide implementation of the Human Proteome Organisation Proteomics Standards Initiative Molecular Interactions (HUPO PSI-MI) format in 2004 was a major step towards the establishment of a single, unified format by which molecular interactions should be presented, but focused purely on protein-protein interactions. RESULTS: The HUPO-PSI has further developed the PSI-MI XML schema to enable the description of interactions between a wider range of molecular types, for example nucleic acids, chemical entities, and molecular complexes. Extensive details about each supported molecular interaction can now be captured, including the biological role of each molecule within that interaction, detailed description of interacting domains, and the kinetic parameters of the interaction. The format is supported by data management and analysis tools and has been adopted by major interaction data providers. Additionally, a simpler, tab-delimited format MITAB2.5 has been developed for the benefit of users who require only minimal information in an easy to access configuration. CONCLUSION: The PSI-MI XML2.5 and MITAB2.5 formats have been jointly developed by interaction data producers and providers from both the academic and commercial sector, and are already widely implemented and well supported by an active development community. PSI-MI XML2.5 enables the description of highly detailed molecular interaction data and facilitates data exchange between databases and users without loss of information. MITAB2.5 is a simpler format appropriate for fast Perl parsing or loading into Microsoft Excel.


Assuntos
Bases de Dados de Proteínas/normas , Processamento de Linguagem Natural , Mapeamento de Interação de Proteínas/métodos , Proteômica/métodos , Biologia Computacional , Gráficos por Computador , Sistemas de Gerenciamento de Base de Dados , Proteômica/normas , Interface Usuário-Computador
8.
F1000Res ; 7: 75, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30416713

RESUMO

Open PHACTS is a pre-competitive project to answer scientific questions developed recently by the pharmaceutical industry. Having high quality biological interaction information in the Open PHACTS Discovery Platform is needed to answer multiple pathway related questions. To address this, updated WikiPathways data has been added to the platform. This data includes information about biological interactions, such as stimulation and inhibition. The platform's Application Programming Interface (API) was extended with appropriate calls to reference these interactions. These new methods of the Open PHACTS API are available now.


Assuntos
Antineoplásicos/farmacologia , Pesquisa Biomédica , Biologia Computacional/métodos , Descoberta de Drogas , Armazenamento e Recuperação da Informação/métodos , Transdução de Sinais , Software , Indústria Farmacêutica , Humanos , Hipertrofia/tratamento farmacológico , Hipertrofia/patologia , Miócitos Cardíacos/citologia , Miócitos Cardíacos/efeitos dos fármacos , Neoplasias/tratamento farmacológico , Neoplasias/patologia
9.
J Biomed Semantics ; 8(1): 55, 2017 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-29197409

RESUMO

BACKGROUND: The disease and phenotype track was designed to evaluate the relative performance of ontology matching systems that generate mappings between source ontologies. Disease and phenotype ontologies are important for applications such as data mining, data integration and knowledge management to support translational science in drug discovery and understanding the genetics of disease. RESULTS: Eleven systems (out of 21 OAEI participating systems) were able to cope with at least one of the tasks in the Disease and Phenotype track. AML, FCA-Map, LogMap(Bio) and PhenoMF systems produced the top results for ontology matching in comparison to consensus alignments. The results against manually curated mappings proved to be more difficult most likely because these mapping sets comprised mostly subsumption relationships rather than equivalence. Manual assessment of unique equivalence mappings showed that AML, LogMap(Bio) and PhenoMF systems have the highest precision results. CONCLUSIONS: Four systems gave the highest performance for matching disease and phenotype ontologies. These systems coped well with the detection of equivalence matches, but struggled to detect semantic similarity. This deserves more attention in the future development of ontology matching systems. The findings of this evaluation show that such systems could help to automate equivalence matching in the workflow of curators, who maintain ontology mapping services in numerous domains such as disease and phenotype.


Assuntos
Ontologias Biológicas , Doença , Fenótipo , Consenso , Humanos
10.
PLoS One ; 11(5): e0155811, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27196054

RESUMO

Drug development is both increasing in cost whilst decreasing in productivity. There is a general acceptance that the current paradigm of R&D needs to change. One alternative approach is drug repositioning. With target-based approaches utilised heavily in the field of drug discovery, it becomes increasingly necessary to have a systematic method to rank gene-disease associations. Although methods already exist to collect, integrate and score these associations, they are often not a reliable reflection of expert knowledge. Furthermore, the amount of data available in all areas covered by bioinformatics is increasing dramatically year on year. It thus makes sense to move away from more generalised hypothesis driven approaches to research to one that allows data to generate their own hypothesis. We introduce an integrated, data driven approach to drug repositioning. We first apply a Bayesian statistics approach to rank 309,885 gene-disease associations using existing knowledge. Ranked associations are then integrated with other biological data to produce a semantically-rich drug discovery network. Using this network, we show how our approach identifies diseases of the central nervous system (CNS) to be an area of interest. CNS disorders are identified due to the low numbers of such disorders that currently have marketed treatments, in comparison to other therapeutic areas. We then systematically mine our network for semantic subgraphs that allow us to infer drug-disease relations that are not captured in the network. We identify and rank 275,934 drug-disease has_indication associations after filtering those that are more likely to be side effects, whilst commenting on the top ranked associations in more detail. The dataset has been created in Neo4j and is available for download at https://bitbucket.org/ncl-intbio/genediseaserepositioning along with a Java implementation of the searching algorithm.


Assuntos
Mineração de Dados , Descoberta de Drogas/métodos , Reposicionamento de Medicamentos/métodos , Algoritmos , Área Sob a Curva , Teorema de Bayes , Sistema Nervoso Central/efeitos dos fármacos , Biologia Computacional , Gráficos por Computador , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Medical Subject Headings , Curva ROC , Semântica , Software
11.
PeerJ ; 4: e1558, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26844016

RESUMO

Current research and development approaches to drug discovery have become less fruitful and more costly. One alternative paradigm is that of drug repositioning. Many marketed examples of repositioned drugs have been identified through serendipitous or rational observations, highlighting the need for more systematic methodologies to tackle the problem. Systems level approaches have the potential to enable the development of novel methods to understand the action of therapeutic compounds, but requires an integrative approach to biological data. Integrated networks can facilitate systems level analyses by combining multiple sources of evidence to provide a rich description of drugs, their targets and their interactions. Classically, such networks can be mined manually where a skilled person is able to identify portions of the graph (semantic subgraphs) that are indicative of relationships between drugs and highlight possible repositioning opportunities. However, this approach is not scalable. Automated approaches are required to systematically mine integrated networks for these subgraphs and bring them to the attention of the user. We introduce a formal framework for the definition of integrated networks and their associated semantic subgraphs for drug interaction analysis and describe DReSMin, an algorithm for mining semantically-rich networks for occurrences of a given semantic subgraph. This algorithm allows instances of complex semantic subgraphs that contain data about putative drug repositioning opportunities to be identified in a computationally tractable fashion, scaling close to linearly with network data. We demonstrate the utility of our approach by mining an integrated drug interaction network built from 11 sources. This work identified and ranked 9,643,061 putative drug-target interactions, showing a strong correlation between highly scored associations and those supported by literature. We discuss the 20 top ranked associations in more detail, of which 14 are novel and 6 are supported by the literature. We also show that our approach better prioritizes known drug-target interactions, than other state-of-the art approaches for predicting such interactions.

12.
Drug Discov Today ; 19(7): 882-9, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24201223

RESUMO

In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker.


Assuntos
Pesquisa Biomédica/métodos , Mineração de Dados/métodos , Diabetes Mellitus Tipo 2/genética , Semântica , Integração de Sistemas , Animais , Diabetes Mellitus Tipo 2/diagnóstico , Humanos , Bases de Conhecimento
13.
Drug Discov Today ; 18(9-10): 428-34, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23247259

RESUMO

Research in the life sciences requires ready access to primary data, derived information and relevant knowledge from a multitude of sources. Integration and interoperability of such resources are crucial for sharing content across research domains relevant to the life sciences. In this article we present a perspective review of data integration with emphasis on a semantics driven approach to data integration that pushes content into a shared infrastructure, reduces data redundancy and clarifies any inconsistencies. This enables much improved access to life science data from numerous primary sources. The Semantic Enrichment of the Scientific Literature (SESL) pilot project demonstrates feasibility for using already available open semantic web standards and technologies to integrate public and proprietary data resources, which span structured and unstructured content. This has been accomplished through a precompetitive consortium, which provides a cost effective approach for numerous stakeholders to work together to solve common problems.


Assuntos
Coleta de Dados , Disseminação de Informação , Armazenamento e Recuperação da Informação , Integração de Sistemas , Disciplinas das Ciências Biológicas , Humanos , Internet
14.
Diabetes ; 61(5): 1297-301, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22403302

RESUMO

Increased adiponectin levels have been shown to be associated with a lower risk of type 2 diabetes. To understand the relations between genetic variation at the adiponectin-encoding gene, ADIPOQ, and adiponectin levels, and subsequently its role in disease, we conducted a deep resequencing experiment of ADIPOQ in 14,002 subjects, including 12,514 Europeans, 594 African Americans, and 567 Indian Asians. We identified 296 single nucleotide polymorphisms (SNPs), including 30 amino acid changes, and carried out association analyses in a subset of 3,665 subjects from two independent studies. We confirmed multiple genome-wide association study findings and identified a novel association between a low-frequency SNP (rs17366653) and adiponectin levels (P = 2.2E-17). We show that seven SNPs exert independent effects on adiponectin levels. Together, they explained 6% of adiponectin variation in our samples. We subsequently assessed association between these SNPs and type 2 diabetes in the Genetics of Diabetes Audit and Research in Tayside Scotland (GO-DARTS) study, comprised of 5,145 case and 6,374 control subjects. No evidence of association with type 2 diabetes was found, but we were also unable to exclude the possibility of substantial effects (e.g., odds ratio 95% CI for rs7366653 [0.91-1.58]). Further investigation by large-scale and well-powered Mendelian randomization studies is warranted.


Assuntos
Adiponectina/genética , Adiponectina/metabolismo , Diabetes Mellitus Tipo 2/genética , Adiponectina/sangue , Sequência de Bases , Biologia Computacional , Predisposição Genética para Doença , Humanos , Polimorfismo de Nucleotídeo Único , Grupos Raciais
15.
Science ; 337(6090): 100-4, 2012 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-22604722

RESUMO

Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.


Assuntos
Doença/genética , Variação Genética , Genoma Humano , Negro ou Afro-Americano/genética , Povo Asiático , Frequência do Gene , Estudos de Associação Genética , Predisposição Genética para Doença , Geografia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Terapia de Alvo Molecular , Herança Multifatorial , Taxa de Mutação , Farmacogenética , Fenótipo , Polimorfismo de Nucleotídeo Único , Crescimento Demográfico , Tamanho da Amostra , Seleção Genética , População Branca/genética
16.
Drug Discov Today ; 16(11-12): 512-9, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21440664

RESUMO

Next-generation sequencing (NGS) technologies represent a paradigm shift in sequencing capability. The technology has already been extensively applied to biological research, resulting in significant and remarkable insights into the molecular biology of cells. In this review, we focus on current and potential applications of the technology as applied to the drug discovery and development process. Early applications have focused on the oncology and infectious disease therapeutic areas, with emerging use in biopharmaceutical development and vaccine production in evidence. Although this technology has great potential, significant challenges remain, particularly around the storage, transfer and analysis of the substantial data sets generated.


Assuntos
Biofarmácia/métodos , Descoberta de Drogas/métodos , Ensaios de Triagem em Larga Escala/métodos , Farmacogenética/métodos , Análise de Sequência de DNA/métodos , Animais , Humanos , Polimorfismo Genético , Medicina de Precisão/métodos , Análise de Sequência de RNA/métodos , Software
17.
Methods Mol Biol ; 628: 39-52, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20238075

RESUMO

Increasingly, vast amounts of genomics and genetic data are available. Although much of the data is largely accessible to relatively simple web queries, in some cases, more complex queries are required. This paper reviews the hierarchy of tools for querying genetic and genomic data. For querying multiple genes, variants or regions ENSEMBL BioMart and the UCSC Table Browser offer flexible interfaces. For more complex queries, GALAXY is a sophisticated tool for building workflows over existing internet resources. For the most challenging genome scale queries, programmatic access may be required through a defined application programming interface (API) - such as the one provided by Ensembl. All these tools allow one to rapidly ask many questions that were difficult to answer a few years ago, but choosing the appropriate tool for the job is critical.


Assuntos
Bases de Dados Genéticas , Genoma , Animais , Genômica , Humanos , Software
18.
Microbiology (Reading) ; 148(Pt 10): 2975-2986, 2002 10.
Artigo em Inglês | MEDLINE | ID: mdl-12368431

RESUMO

A library of Mycobacterium tuberculosis insertional mutants was generated with the transposon Tn5370. The junction sequence between the transposon and the mycobacterial chromosome was determined, revealing the positions of 1329 unique insertions, 1189 of which were located in 351 different ORFs. Transposition was not completely random and examination of the most susceptible genome regions revealed a lower-than-average G+C content ranging from 54 to 62 mol%. Mutants were obtained in all of the recognized M. tuberculosis functional protein-coding gene classes. About 30% of the disrupted ORFs had matches elsewhere in the genome that suggested redundancy of function. The effect of gene disruption on the virulence of a selected set of defined mutants was investigated in a severe combined immune deficiency (SCID) mouse model. A range of phenotypes was observed in these mutants, the most notable being the severe attenuation in virulence of a strain disrupted in the Rv1290c gene, which encodes a protein of unknown function. The library described in this study provides a resource of defined mutant strains for use in functional analyses aimed at investigating the role of particular M. tuberculosis genes in virulence and defining their potential as targets for new anti-mycobacterial drugs or as candidates for deletion in a rationally attenuated live vaccine.


Assuntos
Elementos de DNA Transponíveis/genética , Biblioteca Gênica , Mutagênese Insercional , Mycobacterium tuberculosis/patogenicidade , Tuberculose Pulmonar/microbiologia , Animais , Modelos Animais de Doenças , Humanos , Camundongos , Camundongos SCID , Mutação , Mycobacterium tuberculosis/genética , Fases de Leitura Aberta/genética , Virulência
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa