Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Chem Inf Model ; 51(3): 739-53, 2011 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-21384929

RESUMEN

We have produced an open source, freely available, algorithm (Open Parser for Systematic IUPAC Nomenclature, OPSIN) that interprets the majority of organic chemical nomenclature in a fast and precise manner. This has been achieved using an approach based on a regular grammar. This grammar is used to guide tokenization, a potentially difficult problem in chemical names. From the parsed chemical name, an XML parse tree is constructed that is operated on in a stepwise manner until the structure has been reconstructed from the name. Results from OPSIN on various computer generated name/structure pair sets are presented. These show exceptionally high precision (99.8%+) and, when using general organic chemical nomenclature, high recall (98.7-99.2%). This software can serve as the basis for future open source developments of chemical name interpretation.


Asunto(s)
Terminología como Asunto , Modelos Moleculares
2.
Ecol Appl ; 20(1): 263-77, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-20349846

RESUMEN

Hybridization and introgression between introduced and native salmonids threaten the continued persistence of many inland cutthroat trout species. Environmental models have been developed to predict the spread of introgression, but few studies have assessed the role of propagule pressure. We used an extensive set of fish Stocking records and geographic information system (GIS) data to produce a spatially explicit index of potential propagule pressure exerted by introduced rainbow trout in the Upper Kootenay River, British Columbia, Canada. We then used logistic regression and the information-theoretic approach to test the ability of a set of environmental and spatial variables to predict the level of introgression between native westslope cutthroat trout and introduced rainbow trout. Introgression was assessed using between four and seven co-dominant, diagnostic nuclear markers at 45 sites in 31 different streams. The best model for predicting introgression included our GIS propagule pressure index and an environmental variable that accounted for the biogeoclimatic zone of the site (r2=0.62). This model was 1.4 times more likely to explain introgression than the next-best model, which consisted of only the propagule pressure index variable. We created a composite model based on the model-averaged results of the seven top models that included environmental, spatial, and propagule pressure variables. The propagule pressure index had the highest importance weight (0.995) of all variables tested and was negatively related to sites with no introgression. This study used an index of propagule pressure and demonstrated that propagule pressure had the greatest influence on the level of introgression between a native and introduced trout in a human-induced hybrid zone.


Asunto(s)
Ríos , Trucha/fisiología , Alelos , Animales , Colombia Británica , Conservación de los Recursos Naturales , Ecosistema , Modelos Biológicos , Dinámica Poblacional , Trucha/genética
3.
BMC Bioinformatics ; 9 Suppl 11: S4, 2008 Nov 19.
Artículo en Inglés | MEDLINE | ID: mdl-19025690

RESUMEN

BACKGROUND: Chemical named entities represent an important facet of biomedical text. RESULTS: We have developed a system to use character-based n-grams, Maximum Entropy Markov Models and rescoring to recognise chemical names and other such entities, and to make confidence estimates for the extracted entities. An adjustable threshold allows the system to be tuned to high precision or high recall. At a threshold set for balanced precision and recall, we were able to extract named entities at an F score of 80.7% from chemistry papers and 83.2% from PubMed abstracts. Furthermore, we were able to achieve 57.6% and 60.3% recall at 95% precision, and 58.9% and 49.1% precision at 90% recall. CONCLUSION: These results show that chemical named entities can be extracted with good performance, and that the properties of the extraction can be tuned to suit the demands of the task.


Asunto(s)
Biología Computacional/métodos , Almacenamiento y Recuperación de la Información/métodos , Algoritmos , Modelos Químicos , Modelos Estadísticos , Procesamiento de Lenguaje Natural , Programas Informáticos , Terminología como Asunto
4.
J Am Chem Soc ; 130(33): 10834-5, 2008 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-18646752

RESUMEN

A simple water-soluble naphthalenedithiol building block is converted quantitatively into a series of octameric [2]-catenanes, composed of two interlocked molecular squares. When this mixture is re-equilibrated in the presence of an adamantyl ammonium guest, the catenanes disassemble into their macrocyclic components that bind the guest with nanomolar affinity in water.


Asunto(s)
Catenanos/química , Catenanos/síntesis química , Técnicas Químicas Combinatorias/métodos , Compuestos de Sulfhidrilo/química , Compuestos de Sulfhidrilo/síntesis química , Cromatografía Líquida de Alta Presión/métodos , Ciclización , Espectroscopía de Resonancia Magnética/métodos , Modelos Moleculares , Estructura Molecular , Tamaño de la Partícula , Solubilidad , Agua/química
5.
J Cheminform ; 10(1): 59, 2018 Dec 06.
Artículo en Inglés | MEDLINE | ID: mdl-30523437

RESUMEN

Chemical named entity recognition (NER) has traditionally been dominated by conditional random fields (CRF)-based approaches but given the success of the artificial neural network techniques known as "deep learning" we decided to examine them as an alternative to CRFs. We present here several chemical named entity recognition systems. The first system translates the traditional CRF-based idioms into a deep learning framework, using rich per-token features and neural word embeddings, and producing a sequence of tags using bidirectional long short term memory (LSTM) networks-a type of recurrent neural net. The second system eschews the rich feature set-and even tokenisation-in favour of character labelling using neural character embeddings and multiple LSTM layers. The third system is an ensemble that combines the results of the first two systems. Our original BioCreative V.5 competition entry was placed in the top group with the highest F scores, and subsequent using transfer learning have achieved a final F score of 90.33% on the test data (precision 91.47%, recall 89.21%).

6.
Ecology ; 87(7): 1722-32, 2006 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-16922322

RESUMEN

Forest fire occurrence is affected by multiple controls that operate at local to regional scales. At the spatial scale of forest stands, regional climatic controls may be obscured by local controls (e.g., stochastic ignitions, topography, and fuel loads), but the long-term role of such local controls is poorly understood. We report here stand-scale (<100 ha) fire histories of the past 5000 years based on the analysis of sediment charcoal at two lakes 11 km apart in southeastern British Columbia. The two lakes are today located in similar subalpine forests, and they likely have experienced the same late-Holocene climatic changes because of their close proximity. We evaluated two independent properties of fire history: (1) fire-interval distribution, a measure of the overall incidence of fire, and (2) fire synchroneity, a measure of the co-occurrence of fire (here, assessed at centennial to millennial time scales due to the resolution of sediment records). Fire-interval distributions differed between the sites prior to, but not after, 2500 yr before present. When the entire 5000-yr period is considered, no statistical synchrony between fire-episode dates existed between the two sites at any temporal scale, but for the last 2500 yr marginal levels of synchrony occurred at centennial scales. Each individual fire record exhibited little coherency with regional climate changes. In contrast, variations in the composite record (average of both sites) matched variations in climate evidenced by late-Holocene glacial advances. This was probably due to the increased sample size and spatial extent represented by the composite record (up to 200 ha) plus increased regional climatic variability over the last several millennia, which may have partially overridden local, non-climatic controls. We conclude that (1) over past millennia, neighboring stands with similar modern conditions may have experienced different fire intervals and asynchronous patterns in fire episodes, likely because local controls outweighed the synchronizing effect of climate; (2) the influence of climate on fire occurrence is more strongly expressed when climatic variability is relatively great; and (3) multiple records from a region are essential if climate-fire relations are to be reliably described.


Asunto(s)
Clima , Ecosistema , Incendios/historia , Colombia Británica , Sedimentos Geológicos , Historia del Siglo XV , Historia del Siglo XVI , Historia del Siglo XVII , Historia del Siglo XVIII , Historia del Siglo XIX , Historia del Siglo XX , Historia Antigua , Historia Medieval , Factores de Tiempo , Árboles/fisiología
7.
Org Lett ; 6(11): 1825-7, 2004 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-15151424

RESUMEN

Using simple computer simulations of model dynamic combinatorial libraries, we show that the best binders can be amplified to useful concentrations in libraries containing 10-10(6) compounds. [structure: see text]

9.
J Biomed Semantics ; 2 Suppl 5: S11, 2011 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-22166494

RESUMEN

BACKGROUND: Competitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions. RESULTS: All four PPs from the CALBC project and in addition, 12 challenge participants (CPs) contributed annotated data sets for an evaluation against the SSC-I. CPs could ignore the training data and deliver the annotations from their genuine annotation system, or could train a machine-learning approach on the provided pre-annotated data. In general, the performances of the annotation solutions were lower for entities from the categories CHED and PRGE in comparison to the identification of entities categorized as DISO and SPE. The best performance over all semantic groups were achieved from two annotation solutions that have been trained on the SSC-I.The data sets from participants were used to generate the harmonised Silver Standard Corpus II (SSC-II), if the participant did not make use of the annotated data set from the SSC-I for training purposes. The performances of the participants' solutions were again measured against the SSC-II. The performances of the annotation solutions showed again better results for DISO and SPE in comparison to CHED and PRGE. CONCLUSIONS: The SSC-I delivers a large set of annotations (1,121,705) for a large number of documents (100,000 Medline abstracts). The annotations cover four different semantic groups and are sufficiently homogeneous to be reproduced with a trained classifier leading to an average F-measure of 85%. Benchmarking the annotation solutions against the SSC-II leads to better performance for the CPs' annotation solutions in comparison to the SSC-I.

10.
J Bioinform Comput Biol ; 8(1): 163-79, 2010 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-20183881

RESUMEN

The CALBC initiative aims to provide a large-scale biomedical text corpus that contains semantic annotations for named entities of different kinds. The generation of this corpus requires that the annotations from different automatic annotation systems be harmonized. In the first phase, the annotation systems from five participants (EMBL-EBI, EMC Rotterdam, NLM, JULIE Lab Jena, and Linguamatics) were gathered. All annotations were delivered in a common annotation format that included concept identifiers in the boundary assignments and that enabled comparison and alignment of the results. During the harmonization phase, the results produced from those different systems were integrated in a single harmonized corpus ("silver standard" corpus) by applying a voting scheme. We give an overview of the processed data and the principles of harmonization--formal boundary reconciliation and semantic matching of named entities. Finally, all submissions of the participants were evaluated against that silver standard corpus. We found that species and disease annotations are better standardized amongst the partners than the annotations of genes and proteins. The raw corpus is now available for additional named entity annotations. Parts of it will be made available later on for a public challenge. We expect that we can improve corpus building activities both in terms of the numbers of named entity classes being covered, as well as the size of the corpus in terms of annotated documents.


Asunto(s)
Biología Computacional/normas , Minería de Datos/normas , Conducta Cooperativa , Minería de Datos/estadística & datos numéricos , Bases de Datos Factuales/estadística & datos numéricos , Unified Medical Language System
11.
Chemistry ; 14(7): 2153-66, 2008.
Artículo en Inglés | MEDLINE | ID: mdl-18081129

RESUMEN

Herein we describe an extensive study of the response of a set of closely related dynamic combinatorial libraries (DCLs) of macrocyclic receptors to the introduction of a focused range of guest molecules. We have determined the amplification of two sets of diastereomeric receptors induced by a series of neutral and cationic guests, including biologically relevant compounds such as acetylcholine and morphine. The host-guest binding affinities were investigated using isothermal titration calorimetry. The resulting dataset enabled a detailed analysis of the relationship between the amplification of selected receptors and host-guest Gibbs binding energies, giving insight into the factors affecting the design, simulation and interpretation of DCL experiments. In particular, two questions were addressed: Is amplification by a given guest selective for the best receptor? And does the best guest induce the largest amplification of a given receptor? Our experimental results and computer simulations showed that the relative levels of amplification of hosts by a guest are well-correlated with their relative affinities, and simulations have confirmed previous observations that amplification can be selective for the best receptor when only modest amounts of guest are used. In contrast, the correlation between guest binding and the extent of amplification of a given receptor across a wide range of guests tends to be poorer, because every guest has its own unique set of affinities for competing receptors in the DCL. This implies that the results of screening a DCL for selective receptors by comparing the response of the mixture to two different guests should be interpreted with caution. DCLs are complex mixtures in which all compounds are connected through a set of equilibria. Obtaining quantitative information about all host-guest binding constants from such systems will require the explicit and simultaneous consideration of all of the main equilibria within a DCL.


Asunto(s)
Técnicas Químicas Combinatorias , Compuestos Macrocíclicos/química , Bibliotecas de Moléculas Pequeñas/química , Sitios de Unión , Simulación por Computador , Compuestos Macrocíclicos/síntesis química , Modelos Químicos , Estructura Molecular , Reproducibilidad de los Resultados , Bibliotecas de Moléculas Pequeñas/síntesis química , Estereoisomerismo , Agua/química
12.
J Am Chem Soc ; 127(26): 9390-2, 2005 Jul 06.
Artículo en Inglés | MEDLINE | ID: mdl-15984865

RESUMEN

Dynamic combinatorial chemistry is a powerful tool for the discovery of strong binders (synthetic receptors or ligands) because binding causes a shift in the equilibrium of library members toward those that bind well. Ideally, the best binders are selectively amplified. However, theoretical studies predict this is not always the case. This paper describes the first quantitative experimental evidence proving that, under special circumstances, the preferential amplification of suboptimal synthetic receptors can indeed occur. Our results also demonstrate that reducing the amount of guest in the library can rectify such undesirable behavior and ensures selective amplification of the fittest receptor.

13.
J Am Chem Soc ; 127(25): 8902-3, 2005 Jun 29.
Artículo en Inglés | MEDLINE | ID: mdl-15969538

RESUMEN

A high-affinity, induced-fit receptor for NMe4I was discovered using dynamic combinatorial chemistry. The addition of the guest to a dynamic combinatorial library made using a racemic mixture of chiral building blocks caused the strong and highly diastereoselective amplification of the receptor at the expense of other library components. The receptor and its mode of binding were characterized by NMR, ITC, and re-equilibration experiments, from which it was deduced that the receptor probably forms a folded four-stave barrel shape on binding of the guest.


Asunto(s)
Técnicas Químicas Combinatorias/métodos , Disulfuros/química , Compuestos Heterocíclicos/química , Compuestos Macrocíclicos/química , Ciclización , Disulfuros/síntesis química , Compuestos Heterocíclicos/síntesis química , Compuestos Macrocíclicos/síntesis química , Modelos Moleculares , Estructura Molecular , Estereoisomerismo , Termodinámica
14.
Chemistry ; 10(13): 3139-43, 2004 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-15224322

RESUMEN

We present a versatile computer model of diverse dynamic combinatorial libraries, and examine how molecular recognition between library members and a template can be used to amplify the best binders. The correlation between host-guest binding and amplification was examined for a set of 50 libraries with >300 components each over a wide range of template and building block concentrations. Depending on these concentrations correlations vary from poor (when using a large excess of template) to good (for very dilute libraries and/or substoichiometric template concentrations), highlighting the need to choose the experimental conditions for dynamic combinatorial libraries thoughtfully.

15.
Chronic Dis Can ; 23(3): 111-9, 2002.
Artículo en Inglés | MEDLINE | ID: mdl-12443567

RESUMEN

An age-stratified population-based random digit dial (RDD) telephone survey determined awareness and prevalence of prostate-specific antigen (PSA) testing among Alberta men aged 40 74 years, and assessed the role of indications for PSA testing in explaining patterns of PSA testing. The sample of 1984 men (participation rate 65%) with no history of prostate cancer was divided into three age strata: 40-49, 50-59, and 60-74 years. Awareness of PSA tests was low with fewer than half of the men indicating they had ever heard of PSA tests. The percentage of men who had ever had PSA testing was 4.5%, 13.1%, and 22.2% in the three age strata respectively. PSA testing was strongly associated with having at least one clinical indication for PSA testing (prevalence 21.8%, 26.9%, and 42.2% respectively). PSA testing rates were very low among men who had no clinical indications for PSA testing, suggesting infrequent PSA screening prior to the survey. PSA testing patterns in this population-based sample were consistent with Alberta clinical practice guidelines.


Asunto(s)
Conocimientos, Actitudes y Práctica en Salud , Tamizaje Masivo/estadística & datos numéricos , Antígeno Prostático Específico , Neoplasias de la Próstata/prevención & control , Adulto , Distribución por Edad , Anciano , Alberta , Humanos , Modelos Logísticos , Masculino , Persona de Mediana Edad , Factores Socioeconómicos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA