Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
R Soc Open Sci ; 10(7): 230207, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38033719

RESUMO

Twitter is in turmoil and the scholarly community on the platform is once again starting to migrate. As with the early internet, scholarly organizations are at the forefront of developing and implementing a decentralized alternative to Twitter, Mastodon. Both historically and conceptually, this is not a new situation for the scholarly community. Historically, scholars were forced to leave social media platform FriendFeed after it was bought by Facebook in 2006. Conceptually, the problems associated with public scholarly discourse subjected to the whims of corporate owners are not unlike those of scholarly journals owned by monopolistic corporations: in both cases the perils associated with a public good in private hands are palpable. For both short form (Twitter/Mastodon) and longer form (journals) scholarly discourse, decentralized solutions exist, some of which are already enjoying some institutional support. Here we argue that scholarly organizations, in particular learned societies, are now facing a golden opportunity to rethink their hesitations towards such alternatives and support the migration of the scholarly community from Twitter to Mastodon by hosting Mastodon instances. Demonstrating that the scholarly community is capable of creating a truly public square for scholarly discourse, impervious to private takeover, might renew confidence and inspire the community to focus on analogous solutions for the remaining scholarly record-encompassing text, data and code-to safeguard all publicly owned scholarly knowledge.

2.
J Chem Inf Model ; 51(3): 739-53, 2011 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-21384929

RESUMO

We have produced an open source, freely available, algorithm (Open Parser for Systematic IUPAC Nomenclature, OPSIN) that interprets the majority of organic chemical nomenclature in a fast and precise manner. This has been achieved using an approach based on a regular grammar. This grammar is used to guide tokenization, a potentially difficult problem in chemical names. From the parsed chemical name, an XML parse tree is constructed that is operated on in a stepwise manner until the structure has been reconstructed from the name. Results from OPSIN on various computer generated name/structure pair sets are presented. These show exceptionally high precision (99.8%+) and, when using general organic chemical nomenclature, high recall (98.7-99.2%). This software can serve as the basis for future open source developments of chemical name interpretation.


Assuntos
Terminologia como Assunto , Modelos Moleculares
3.
J Chem Inf Model ; 50(2): 251-61, 2010 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-20088574

RESUMO

The SPECTRa-T project has developed text-mining tools to extract named chemical entities (NCEs), such as chemical names and terms, and chemical objects (COs), e.g., experimental spectral assignments and physical chemistry properties, from electronic theses (e-theses). Although NCEs were readily identified within the two major document formats studied, only the use of structured documents enabled identification of chemical objects and their association with the relevant chemical entity (e.g., systematic chemical name). A corpus of theses was analyzed and it is shown that a high degree of semantic information can be extracted from structured documents. This integrated information has been deposited in a persistent Resource Description Framework (RDF) triple-store that allows users to conduct semantic searches. The strength and weaknesses of several document formats are reviewed.


Assuntos
Dissertações Acadêmicas como Assunto , Química/educação , Mineração de Dados/métodos , Software , Bases de Dados Factuais , Processamento Eletrônico de Dados , Reações Falso-Positivas
4.
J Phys Chem A ; 114(43): 11825-32, 2010 Nov 04.
Artigo em Inglês | MEDLINE | ID: mdl-20923209

RESUMO

This work presents thermochemical data for possible gas phase intermediate species in an industrial rutile chlorinator. An algorithm developed for previous work is employed to ensure that all possible species are considered, reducing the number of important species neglected. Thermochemical data and enthalpies of formation are calculated for 22 new species using density functional theory, post Hartree-Fock coupled cluster calculations, and statistical mechanics. Equilibrium calculations are performed to identify whether any Ti/C intermediates are likely to be important to the high temperature industrial process. These new species are not present at high concentration in the exit stream. It is therefore likely that the two chemical processes do not interact. Rather, the Cl2 rapidly reacts with the solid TiO2 to form TiCl4 and O2. The latter then reacts with the solid C to form CO and CO2 and provide the heat. Data for all the new species is provided as Supporting Information. Finally, a new methodology for data collaboration is investigated in which the data is made openly accessible using the resource description framework. Example scripts are provided to demonstrate how to query and retrieve the data automatically.


Assuntos
Cloretos/química , Teoria Quântica , Termodinâmica , Algoritmos , Gases/química , Oxigênio/química , Titânio/química
6.
Nucleic Acids Res ; 35(Database issue): D515-20, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17082206

RESUMO

MACiE (Mechanism, Annotation and Classification in Enzymes) is a database of enzyme reaction mechanisms, and is publicly available as a web-based data resource. This paper presents the first release of a web-based search tool to explore enzyme reaction mechanisms in MACiE. We also present Version 2 of MACiE, which doubles the dataset available (from Version 1). MACiE can be accessed from http://www.ebi.ac.uk/thornton-srv/databases/MACiE/


Assuntos
Bases de Dados de Proteínas , Enzimas/química , Catálise , Enzimas/classificação , Enzimas/metabolismo , Internet , Conformação Proteica , Homologia de Sequência de Aminoácidos , Software , Interface Usuário-Computador
7.
BMC Bioinformatics ; 8: 59, 2007 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-17316423

RESUMO

BACKGROUND: There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no successful attempts have been made to integrate chemo- and bioinformatics into a single framework. RESULTS: Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. CONCLUSION: Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at http://www.bioclipse.net.


Assuntos
Bioquímica/métodos , Biologia Computacional/métodos , Genômica/métodos , Linguagens de Programação , Design de Software , Software , Interface Usuário-Computador , Gráficos por Computador
8.
BMC Bioinformatics ; 6: 180, 2005 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-16026614

RESUMO

The current methods of publishing chemical information in bioscience articles are analysed. Using 3 papers as use-cases, it is shown that conventional methods using human procedures, including cut-and-paste are time-consuming and introduce errors. The meaning of chemical terms and the identity of compounds is often ambiguous. valuable experimental data such as spectra and computational results are almost always omitted. We describe an Open XML architecture at proof-of-concept which addresses these concerns. Compounds are identified through explicit connection tables or links to persistent Open resources such as PubChem. It is argued that if publishers adopt these tools and protocols, then the quality and quantity of chemical information available to bioscientists will increase and the authors, publishers and readers will find the process cost-effective.


Assuntos
Química , Armazenamento e Recuperação da Informação/métodos , Jornalismo , Terminologia como Assunto , Arquivos , Fenômenos Químicos , Apresentação de Dados , Bases de Dados Factuais , Modelos Químicos , Estrutura Molecular
9.
BMC Bioinformatics ; 6: 141, 2005 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-15941476

RESUMO

Chemical information is now seen as critical for most areas of life sciences. But unlike Bioinformatics, where data is openly available and freely re-usable, most chemical information is closed and cannot be re-distributed without permission. This has led to a failure to adopt modern informatics and software techniques and therefore paucity of chemistry in bioinformatics. New technology, however, offers the hope of making chemical data (compounds and properties) free during the authoring process. We argue that the technology is already available; we require a collective agreement to enhance publication protocols.


Assuntos
Química/métodos , Química/tendências , Biologia Computacional/métodos , Biologia Computacional/tendências , Editoração , Acesso à Informação , Bases de Dados como Assunto , Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Internet , Software , Integração de Sistemas , Interface Usuário-Computador
10.
J Cheminform ; 7: 43, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26322133

RESUMO

BACKGROUND: The desirable curation of 158,122 molecular geometries derived from the NCI set of reference molecules together with associated properties computed using the MOPAC semi-empirical quantum mechanical method and originally deposited in 2005 into the Cambridge DSpace repository as a data collection is reported. RESULTS: The procedures involved in the curation included annotation of the original data using new MOPAC methods, updating the syntax of the CML documents used to express the data to ensure schema conformance and adding new metadata describing the entries together with a XML schema transformation to map the metadata schema to that used by the DataCite organisation. We have adopted a granularity model in which a DataCite persistent identifier (DOI) is created for each individual molecule to enable data discovery and data metrics at this level using DataCite tools. CONCLUSIONS: We recommend that the future research data management (RDM) of the scientific and chemical data components associated with journal articles (the "supporting information") should be conducted in a manner that facilitates automatic periodic curation. Graphical abstractStandards and metadata-based curation of a decade-old digital repository dataset of molecular information.

11.
J Cheminform ; 4(1): 14, 2012 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-22856527

RESUMO

The articles in this special issue arise from a workshop and symposium held in January 2012 (Semantic Physical Science'). We invited people who shared our vision for the potential of the web to support chemical and related subjects. Other than the initial invitations, we have not exercised any control over the content of the contributed articles.

12.
J Cheminform ; 4(1): 15, 2012 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-22870956

RESUMO

: This paper introduces a subdomain chemistry format for storing computational chemistry data called CompChem. It has been developed based on the design, concepts and methodologies of Chemical Markup Language (CML) by adding computational chemistry semantics on top of the CML Schema. The format allows a wide range of ab initio quantum chemistry calculations of individual molecules to be stored. These calculations include, for example, single point energy calculation, molecular geometry optimization, and vibrational frequency analysis. The paper also describes the supporting infrastructure, such as processing software, dictionaries, validation tools and database repositories. In addition, some of the challenges and difficulties in developing common computational chemistry dictionaries are discussed. The uses of CompChem are illustrated by two practical applications.

13.
J Cheminform ; 3: 48, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999715

RESUMO

The articles in this special issue represent the culmination of about 15 years working with the potential of the web to support chemical and related subjects. The selection of papers arises from a symposium held in January 2011 ('Visions of a Semantic Molecular Future') which gave me an opportunity to invite many people who shared the same vision. I have asked them to contribute their papers and most have been able to do so. They cover a wide range of content, approaches and styles and apart from the selection of the speakers (and hence the authors) I have not exercised any control over the content.

14.
J Cheminform ; 3(1): 39, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999395

RESUMO

CMLLite is a collection of definitions and processes which provide strong and flexible validation for a document in Chemical Markup Language (CML). It consists of an updated CML schema (schema3), conventions specifying rules in both human and machine-understandable forms and a validator available both online and offline to check conformance. This article explores the rationale behind the changes which have been made to the schema, explains how conventions interact and how they are designed, formulated, implemented and tested, and gives an overview of the validation service.

15.
J Cheminform ; 3(1): 44, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999549

RESUMO

A retrospective view of the design and evolution of Chemical Markup Language (CML) is presented by its original authors.

16.
J Cheminform ; 3(1): 40, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999425

RESUMO

Linked Open Data presents an opportunity to vastly improve the quality of science in all fields by increasing the availability and usability of the data upon which it is based. In the chemical field, there is a huge amount of information available in the published literature, the vast majority of which is not available in machine-understandable formats. PatentEye, a prototype system for the extraction and semantification of chemical reactions from the patent literature has been implemented and is discussed. A total of 4444 reactions were extracted from 667 patent documents that comprised 10 weeks' worth of publications from the European Patent Office (EPO), with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra reported as product characterisation data are additionally captured.

17.
PLoS One ; 6(5): e20181, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21633495

RESUMO

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR.


Assuntos
Algoritmos , Química/métodos , Processamento de Linguagem Natural , Terminologia como Assunto , Biologia Computacional/métodos , Cadeias de Markov , PubMed , Reprodutibilidade dos Testes , Fluxo de Trabalho
18.
J Cheminform ; 3(1): 42, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999475

RESUMO

The World-Wide Molecular Matrix (WWMM) is a ten year project to create a peer-to-peer (P2P) system for the publication and collection of chemical objects, including over 250, 000 molecules. It has now been instantiated in a number of repositories which include data encoded in Chemical Markup Language (CML) and linked by URIs and RDF. The technical specification and implementation is now complete. We discuss the types of architecture required to implement nodes in the WWMM and consider the social issues involved in adoption.

19.
J Cheminform ; 3: 43, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-21999509

RESUMO

The semantic architecture of CML consists of conventions, dictionaries and units. The conventions conform to a top-level specification and each convention can constrain compliant documents through machine-processing (validation). Dictionaries conform to a dictionary specification which also imposes machine validation on the dictionaries. Each dictionary can also be used to validate data in a CML document, and provide human-readable descriptions. An additional set of conventions and dictionaries are used to support scientific units. All conventions, dictionaries and dictionary elements are identifiable and addressable through unique URIs.

20.
J Cheminform ; 3(1): 17, 2011 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-21575201

RESUMO

BACKGROUND: The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. RESULTS: We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). CONCLUSIONS: It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa