Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Cheminform ; 7: 43, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26322133

RESUMO

BACKGROUND: The desirable curation of 158,122 molecular geometries derived from the NCI set of reference molecules together with associated properties computed using the MOPAC semi-empirical quantum mechanical method and originally deposited in 2005 into the Cambridge DSpace repository as a data collection is reported. RESULTS: The procedures involved in the curation included annotation of the original data using new MOPAC methods, updating the syntax of the CML documents used to express the data to ensure schema conformance and adding new metadata describing the entries together with a XML schema transformation to map the metadata schema to that used by the DataCite organisation. We have adopted a granularity model in which a DataCite persistent identifier (DOI) is created for each individual molecule to enable data discovery and data metrics at this level using DataCite tools. CONCLUSIONS: We recommend that the future research data management (RDM) of the scientific and chemical data components associated with journal articles (the "supporting information") should be conducted in a manner that facilitates automatic periodic curation. Graphical abstractStandards and metadata-based curation of a decade-old digital repository dataset of molecular information.

2.
J Cheminform ; 4(1): 15, 2012 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-22870956

RESUMO

: This paper introduces a subdomain chemistry format for storing computational chemistry data called CompChem. It has been developed based on the design, concepts and methodologies of Chemical Markup Language (CML) by adding computational chemistry semantics on top of the CML Schema. The format allows a wide range of ab initio quantum chemistry calculations of individual molecules to be stored. These calculations include, for example, single point energy calculation, molecular geometry optimization, and vibrational frequency analysis. The paper also describes the supporting infrastructure, such as processing software, dictionaries, validation tools and database repositories. In addition, some of the challenges and difficulties in developing common computational chemistry dictionaries are discussed. The uses of CompChem are illustrated by two practical applications.

3.
J Cheminform ; 4(1): 14, 2012 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-22856527

RESUMO

The articles in this special issue arise from a workshop and symposium held in January 2012 (Semantic Physical Science'). We invited people who shared our vision for the potential of the web to support chemical and related subjects. Other than the initial invitations, we have not exercised any control over the content of the contributed articles.

5.
J Cheminform ; 3(1): 37, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999342

RESUMO

BACKGROUND: The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data, Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and Open Standards. RESULTS: This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveys progress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry. CONCLUSIONS: We show that the Blue Obelisk has been very successful in bringing together researchers and developers with common interests in ODOSOS, leading to development of many useful resources freely available to the chemistry community.

6.
J Cheminform ; 3: 38, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999363

RESUMO

Computational Quantum Chemistry has developed into a powerful, efficient, reliable and increasingly routine tool for exploring the structure and properties of small to medium sized molecules. Many thousands of calculations are performed every day, some offering results which approach experimental accuracy. However, in contrast to other disciplines, such as crystallography, or bioinformatics, where standard formats and well-known, unified databases exist, this QC data is generally destined to remain locally held in files which are not designed to be machine-readable. Only a very small subset of these results will become accessible to the wider community through publication.In this paper we describe how the Quixote Project is developing the infrastructure required to convert output from a number of different molecular quantum chemistry packages to a common semantically rich, machine-readable format and to build respositories of QC results. Such an infrastructure offers benefits at many levels. The standardised representation of the results will facilitate software interoperability, for example making it easier for analysis tools to take data from different QC packages, and will also help with archival and deposition of results. The repository infrastructure, which is lightweight and built using Open software components, can be implemented at individual researcher, project, organisation or community level, offering the exciting possibility that in future many of these QC results can be made publically available, to be searched and interpreted just as crystallography and bioinformatics results are today.Although we believe that quantum chemists will appreciate the contribution the Quixote infrastructure can make to the organisation and and exchange of their results, we anticipate that greater rewards will come from enabling their results to be consumed by a wider community. As the respositories grow they will become a valuable source of chemical data for use by other disciplines in both research and education.The Quixote project is unconventional in that the infrastructure is being implemented in advance of a full definition of the data model which will eventually underpin it. We believe that a working system which offers real value to researchers based on tools and shared, searchable repositories will encourage early participation from a broader community, including both producers and consumers of data. In the early stages, searching and indexing can be performed on the chemical subject of the calculations, and well defined calculation meta-data. The process of defining more specific quantum chemical definitions, adding them to dictionaries and extracting them consistently from the results of the various software packages can then proceed in an incremental manner, adding additional value at each stage.Not only will these results help to change the data management model in the field of Quantum Chemistry, but the methodology can be applied to other pressing problems related to data in computational and experimental science.

7.
J Cheminform ; 3(1): 39, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999395

RESUMO

CMLLite is a collection of definitions and processes which provide strong and flexible validation for a document in Chemical Markup Language (CML). It consists of an updated CML schema (schema3), conventions specifying rules in both human and machine-understandable forms and a validator available both online and offline to check conformance. This article explores the rationale behind the changes which have been made to the schema, explains how conventions interact and how they are designed, formulated, implemented and tested, and gives an overview of the validation service.

8.
J Cheminform ; 3(1): 40, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999425

RESUMO

Linked Open Data presents an opportunity to vastly improve the quality of science in all fields by increasing the availability and usability of the data upon which it is based. In the chemical field, there is a huge amount of information available in the published literature, the vast majority of which is not available in machine-understandable formats. PatentEye, a prototype system for the extraction and semantification of chemical reactions from the patent literature has been implemented and is discussed. A total of 4444 reactions were extracted from 667 patent documents that comprised 10 weeks' worth of publications from the European Patent Office (EPO), with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra reported as product characterisation data are additionally captured.

9.
J Cheminform ; 3(1): 41, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999457

RESUMO

The Open-Source Chemistry Analysis Routines (OSCAR) software, a toolkit for the recognition of named entities and data in chemistry publications, has been developed since 2002. Recent work has resulted in the separation of the core OSCAR functionality and its release as the OSCAR4 library. This library features a modular API (based on reduction of surface coupling) that permits client programmers to easily incorporate it into external applications. OSCAR4 offers a domain-independent architecture upon which chemistry specific text-mining tools can be built, and its development and usage are discussed.

10.
J Cheminform ; 3(1): 42, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999475

RESUMO

The World-Wide Molecular Matrix (WWMM) is a ten year project to create a peer-to-peer (P2P) system for the publication and collection of chemical objects, including over 250, 000 molecules. It has now been instantiated in a number of repositories which include data encoded in Chemical Markup Language (CML) and linked by URIs and RDF. The technical specification and implementation is now complete. We discuss the types of architecture required to implement nodes in the WWMM and consider the social issues involved in adoption.

11.
J Cheminform ; 3: 43, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999509

RESUMO

The semantic architecture of CML consists of conventions, dictionaries and units. The conventions conform to a top-level specification and each convention can constrain compliant documents through machine-processing (validation). Dictionaries conform to a dictionary specification which also imposes machine validation on the dictionaries. Each dictionary can also be used to validate data in a CML document, and provide human-readable descriptions. An additional set of conventions and dictionaries are used to support scientific units. All conventions, dictionaries and dictionary elements are identifiable and addressable through unique URIs.

12.
J Cheminform ; 3(1): 44, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999549

RESUMO

A retrospective view of the design and evolution of Chemical Markup Language (CML) is presented by its original authors.

13.
J Cheminform ; 3: 45, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999587

RESUMO

The Ami project was a six month Rapid Innovation project sponsored by JISC to explore the Virtual Research Environment space. The project brainstormed with chemists and decided to investigate ways to facilitate monitoring and collection of experimental data.A frequently encountered use-case was identified of how the chemist reaches the end of an experiment, but finds an unexpected result. The ability to replay events can significantly help make sense of how things progressed. The project therefore concentrated on collecting a variety of dimensions of ancillary data - data that would not normally be collected due to practicality constraints. There were three main areas of investigation: 1) Development of a monitoring tool using infrared and ultrasonic sensors; 2) Time-lapse motion video capture (for example, videoing 5 seconds in every 60); and 3) Activity-driven video monitoring of the fume cupboard environs.The Ami client application was developed to control these separate logging functions. The application builds up a timeline of the events in the experiment and around the fume cupboard. The videos and data logs can then be reviewed after the experiment in order to help the chemist determine the exact timings and conditions used.The project experimented with ways in which a Microsoft Kinect could be used in a laboratory setting. Investigations suggest that it would not be an ideal device for controlling a mouse, but it shows promise for usages such as manipulating virtual molecules.

14.
J Cheminform ; 3: 47, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999661

RESUMO

The concept of Open Bibliography in science, technology and medicine (STM) is introduced as a combination of Open Source tools, Open specifications and Open bibliographic data. An Openly searchable and navigable network of bibliographic information and associated knowledge representations, a Bibliographic Knowledge Network, across all branches of Science, Technology and Medicine, has been designed and initiated. For this large scale endeavour, the engagement and cooperation of the multiple stakeholders in STM publishing - authors, librarians, publishers and administrators - is sought.

15.
J Cheminform ; 3: 48, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999715

RESUMO

The articles in this special issue represent the culmination of about 15 years working with the potential of the web to support chemical and related subjects. The selection of papers arises from a symposium held in January 2011 ('Visions of a Semantic Molecular Future') which gave me an opportunity to invite many people who shared the same vision. I have asked them to contribute their papers and most have been able to do so. They cover a wide range of content, approaches and styles and apart from the selection of the speakers (and hence the authors) I have not exercised any control over the content.

16.
Nat Rev Drug Discov ; 10(9): 661-9, 2011 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-21878981

RESUMO

Bioactive molecules such as drugs, pesticides and food additives are produced in large numbers by many commercial and academic groups around the world. Enormous quantities of data are generated on the biological properties and quality of these molecules. Access to such data - both on licensed and commercially available compounds, and also on those that fail during development - is crucial for understanding how improved molecules could be developed. For example, computational analysis of aggregated data on molecules that are investigated in drug discovery programmes has led to a greater understanding of the properties of successful drugs. However, the information required to perform these analyses is rarely published, and when it is made available it is often missing crucial data or is in a format that is inappropriate for efficient data-mining. Here, we propose a solution: the definition of reporting guidelines for bioactive entities - the Minimum Information About a Bioactive Entity (MIABE) - which has been developed by representatives of pharmaceutical companies, data resource providers and academic groups.


Assuntos
Indústria Química/normas , Indústria Farmacêutica/normas , Disseminação de Informação , Animais , Biomarcadores , Físico-Química , Comunicação , Coleta de Dados , Desenho de Fármacos , Guias como Assunto , Humanos , Praguicidas , Preparações Farmacêuticas , Farmacocinética , Terminologia como Assunto , Toxicologia
17.
PLoS One ; 6(5): e20181, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21633495

RESUMO

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR.


Assuntos
Algoritmos , Química/métodos , Processamento de Linguagem Natural , Terminologia como Assunto , Biologia Computacional/métodos , Cadeias de Markov , PubMed , Reprodutibilidade dos Testes , Fluxo de Trabalho
18.
J Cheminform ; 3(1): 17, 2011 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-21575201

RESUMO

BACKGROUND: The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. RESULTS: We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). CONCLUSIONS: It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision.

19.
J Chem Inf Model ; 51(3): 739-53, 2011 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-21384929

RESUMO

We have produced an open source, freely available, algorithm (Open Parser for Systematic IUPAC Nomenclature, OPSIN) that interprets the majority of organic chemical nomenclature in a fast and precise manner. This has been achieved using an approach based on a regular grammar. This grammar is used to guide tokenization, a potentially difficult problem in chemical names. From the parsed chemical name, an XML parse tree is constructed that is operated on in a stepwise manner until the structure has been reconstructed from the name. Results from OPSIN on various computer generated name/structure pair sets are presented. These show exceptionally high precision (99.8%+) and, when using general organic chemical nomenclature, high recall (98.7-99.2%). This software can serve as the basis for future open source developments of chemical name interpretation.


Assuntos
Terminologia como Assunto , Modelos Moleculares
20.
J Phys Chem A ; 114(43): 11825-32, 2010 Nov 04.
Artigo em Inglês | MEDLINE | ID: mdl-20923209

RESUMO

This work presents thermochemical data for possible gas phase intermediate species in an industrial rutile chlorinator. An algorithm developed for previous work is employed to ensure that all possible species are considered, reducing the number of important species neglected. Thermochemical data and enthalpies of formation are calculated for 22 new species using density functional theory, post Hartree-Fock coupled cluster calculations, and statistical mechanics. Equilibrium calculations are performed to identify whether any Ti/C intermediates are likely to be important to the high temperature industrial process. These new species are not present at high concentration in the exit stream. It is therefore likely that the two chemical processes do not interact. Rather, the Cl2 rapidly reacts with the solid TiO2 to form TiCl4 and O2. The latter then reacts with the solid C to form CO and CO2 and provide the heat. Data for all the new species is provided as Supporting Information. Finally, a new methodology for data collaboration is investigated in which the data is made openly accessible using the resource description framework. Example scripts are provided to demonstrate how to query and retrieve the data automatically.


Assuntos
Cloretos/química , Teoria Quântica , Termodinâmica , Algoritmos , Gases/química , Oxigênio/química , Titânio/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA