Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
J Biomed Inform ; 71: 178-189, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-28579531

RESUMO

PROBLEM: Biomedical literature and databases contain important clues for the identification of potential disease biomarkers. However, searching these enormous knowledge reservoirs and integrating findings across heterogeneous sources is costly and difficult. Here we demonstrate how semantically integrated knowledge, extracted from biomedical literature and structured databases, can be used to automatically identify potential migraine biomarkers. METHOD: We used a knowledge graph containing more than 3.5 million biomedical concepts and 68.4 million relationships. Biochemical compound concepts were filtered and ranked by their potential as biomarkers based on their connections to a subgraph of migraine-related concepts. The ranked results were evaluated against the results of a systematic literature review that was performed manually by migraine researchers. Weight points were assigned to these reference compounds to indicate their relative importance. RESULTS: Ranked results automatically generated by the knowledge graph were highly consistent with results from the manual literature review. Out of 222 reference compounds, 163 (73%) ranked in the top 2000, with 547 out of the 644 (85%) weight points assigned to the reference compounds. For reference compounds that were not in the top of the list, an extensive error analysis has been performed. When evaluating the overall performance, we obtained a ROC-AUC of 0.974. DISCUSSION: Semantic knowledge graphs composed of information integrated from multiple and varying sources can assist researchers in identifying potential disease biomarkers.


Assuntos
Biomarcadores , Mineração de Dados , Bases de Dados Factuais , Transtornos de Enxaqueca/diagnóstico , Semântica , Automação , Humanos , Publicações
2.
Anal Biochem ; 421(2): 622-31, 2012 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-22178910

RESUMO

Phage display screenings are frequently employed to identify high-affinity peptides or antibodies. Although successful, phage display is a laborious technology and is notorious for identification of false positive hits. To accelerate and improve the selection process, we have employed Illumina next generation sequencing to deeply characterize the Ph.D.-7 M13 peptide phage display library before and after several rounds of biopanning on KS483 osteoblast cells. Sequencing of the naive library after one round of amplification in bacteria identifies propagation advantage as an important source of false positive hits. Most important, our data show that deep sequencing of the phage pool after a first round of biopanning is already sufficient to identify positive phages. Whereas traditional sequencing of a limited number of clones after one or two rounds of selection is uninformative, the required additional rounds of biopanning are associated with the risk of losing promising clones propagating slower than nonbinding phages. Confocal and live cell imaging confirms that our screen successfully selected a peptide with very high binding and uptake in osteoblasts. We conclude that next generation sequencing can significantly empower phage display screenings by accelerating the finding of specific binders and restraining the number of false positive hits.


Assuntos
Bacteriófago M13/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biblioteca de Peptídeos , Animais , Linhagem Celular , Camundongos
3.
Nat Struct Mol Biol ; 12(12): 1130-6, 2005 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-16273104

RESUMO

As the raw material for evolution, arbitrary RNA sequences represent the baseline for RNA structure formation and a standard to which evolved structures can be compared. Here, we set out to probe, using physical and chemical methods, the structural properties of RNAs having randomly generated oligonucleotide sequences that were of sufficient length and information content to encode complex, functional folds, yet were unbiased by either genealogical or functional constraints. Typically, these unevolved, nonfunctional RNAs had sequence-specific secondary structure configurations and compact magnesium-dependent conformational states comparable to those of evolved RNA isolates. But unlike evolved sequences, arbitrary sequences were prone to having multiple competing conformations. Thus, for RNAs the size of small ribozymes, natural selection seems necessary to achieve uniquely folding sequences, but not to account for the well-ordered secondary structures and overall compactness observed in nature.


Assuntos
Oligorribonucleotídeos/química , RNA/química , Sequência de Bases , Eletroforese em Gel de Poliacrilamida , Chumbo/química , Chumbo/farmacologia , Magnésio/química , Magnésio/farmacologia , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Desnaturação de Ácido Nucleico , RNA/efeitos dos fármacos , Ultracentrifugação
4.
PLoS One ; 11(2): e0149621, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26919047

RESUMO

High-throughput experimental methods such as medical sequencing and genome-wide association studies (GWAS) identify increasingly large numbers of potential relations between genetic variants and diseases. Both biological complexity (millions of potential gene-disease associations) and the accelerating rate of data production necessitate computational approaches to prioritize and rationalize potential gene-disease relations. Here, we use concept profile technology to expose from the biomedical literature both explicitly stated gene-disease relations (the explicitome) and a much larger set of implied gene-disease associations (the implicitome). Implicit relations are largely unknown to, or are even unintended by the original authors, but they vastly extend the reach of existing biomedical knowledge for identification and interpretation of gene-disease associations. The implicitome can be used in conjunction with experimental data resources to rationalize both known and novel associations. We demonstrate the usefulness of the implicitome by rationalizing known and novel gene-disease associations, including those from GWAS. To facilitate the re-use of implicit gene-disease associations, we publish our data in compliance with FAIR Data Publishing recommendations [https://www.force11.org/group/fairgroup] using nanopublications. An online tool (http://knowledge.bio) is available to explore established and potential gene-disease associations in the context of other biomedical relations.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos
5.
J Biomed Semantics ; 6: 5, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26464783

RESUMO

Data from high throughput experiments often produce far more results than can ever appear in the main text or tables of a single research article. In these cases, the majority of new associations are often archived either as supplemental information in an arbitrary format or in publisher-independent databases that can be difficult to find. These data are not only lost from scientific discourse, but are also elusive to automated search, retrieval and processing. Here, we use the nanopublication model to make scientific assertions that were concluded from a workflow analysis of Huntington's Disease data machine-readable, interoperable, and citable. We followed the nanopublication guidelines to semantically model our assertions as well as their provenance metadata and authorship. We demonstrate interoperability by linking nanopublication provenance to the Research Object model. These results indicate that nanopublications can provide an incentive for researchers to expose data that is interoperable and machine-readable for future use and preservation for which they can get credits for their effort. Nanopublications can have a leading role into hypotheses generation offering opportunities to produce large-scale data integration.

6.
Nat Genet ; 47(2): 115-25, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25581432

RESUMO

Many cancer-associated somatic copy number alterations (SCNAs) are known. Currently, one of the challenges is to identify the molecular downstream effects of these variants. Although several SCNAs are known to change gene expression levels, it is not clear whether each individual SCNA affects gene expression. We reanalyzed 77,840 expression profiles and observed a limited set of 'transcriptional components' that describe well-known biology, explain the vast majority of variation in gene expression and enable us to predict the biological function of genes. On correcting expression profiles for these components, we observed that the residual expression levels (in 'functional genomic mRNA' profiling) correlated strongly with copy number. DNA copy number correlated positively with expression levels for 99% of all abundantly expressed human genes, indicating global gene dosage sensitivity. By applying this method to 16,172 patient-derived tumor samples, we replicated many loci with aberrant copy numbers and identified recurrently disrupted genes in genomically unstable cancers.


Assuntos
Variações do Número de Cópias de DNA , Dosagem de Genes , Regulação Neoplásica da Expressão Gênica/genética , Genômica , Neoplasias/genética , Transcriptoma , Hibridização Genômica Comparativa , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Loci Gênicos , Humanos , RNA Mensageiro/genética , RNA Neoplásico/genética
7.
Genome Biol ; 16: 22, 2015 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-25723102

RESUMO

The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom.gsc.riken.jp/5/). This resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.


Assuntos
Genômica/métodos , Regiões Promotoras Genéticas , Software , Iniciação da Transcrição Genética , Animais , Biologia Computacional/métodos , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica , Humanos , Camundongos , Transcriptoma , Interface Usuário-Computador
8.
J Biomed Semantics ; 5(Suppl 1 Proceedings of the Bio-Ontologies Spec Interest G): S6, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25093075

RESUMO

BACKGROUND: Matching and comparing sequence annotations of different reference sequences is vital to genomics research, yet many annotation formats do not specify the reference sequence types or versions used. This makes the integration of annotations from different sources difficult and error prone. RESULTS: As part of our effort to create linked data for interoperable sequence annotations, we present an RDF data model for sequence annotation using the ontological framework established by the OBO Foundry ontologies and the Basic Formal Ontology (BFO). We defined reference sequences as the common domain of integration for sequence annotations, and identified three semantic relationships between sequence annotations. In doing so, we created the Reference Sequence Annotation to compensate for gaps in the SO and in its mapping to BFO, particularly for annotations that refer to versions of consensus reference sequences. Moreover, we present three integration models for sequence annotations using different reference assemblies. CONCLUSIONS: We demonstrated a working example of a sequence annotation instance, and how this instance can be linked to other annotations on different reference sequences. Sequence annotations in this format are semantically rich and can be integrated easily with different assemblies. We also identify other challenges of modeling reference sequences with the BFO.

9.
Biotechnol J ; 8(2): 221-7, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22965937

RESUMO

There is a growing need for sensitive and reliable nucleic acid detection methods that are convenient and inexpensive. Responsive and programmable DNA nanostructures have shown great promise as chemical detection systems. Here, we describe a DNA detection system employing the triggered self-assembly of a novel DNA dendritic nanostructure. The detection protocol is executed autonomously without external intervention. Detection begins when a specific, single-stranded target DNA strand (T) triggers a hybridization chain reaction (HCR) between two, distinct DNA hairpins (α and ß). Each hairpin opens and hybridizes up to two copies of the other. In the absence of T, α and ß are stable and remain in their poised, closed-hairpin form. In the presence of T, α hairpins are opened by toe-hold mediated strand-displacement, each of which then opens and hybridizes two ß hairpins. Likewise, each opened ß hairpin can open and hybridize two α hairpins. Hence, each layer of the growing dendritic nanostructure can in principle accommodate an exponentially increasing number of cognate molecules, generating a high molecular weight nanostructure. This HCR system has minimal sequence constraints, allowing reconfiguration for the detection of arbitrary target sequences. Here, we demonstrate detection of unique sequence identifiers of HIV and Chlamydia pathogens.


Assuntos
DNA/química , DNA/isolamento & purificação , Ouro/química , Nanopartículas Metálicas/química , Técnicas Biossensoriais/instrumentação , Técnicas Biossensoriais/métodos , Chlamydia/isolamento & purificação , Eletroforese em Gel de Poliacrilamida , HIV/isolamento & purificação , Conformação de Ácido Nucleico , Hibridização de Ácido Nucleico , Análise de Sequência de DNA/métodos
10.
PLoS One ; 8(11): e78665, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24260124

RESUMO

MOTIVATION: Weighted semantic networks built from text-mined literature can be used to retrieve known protein-protein or gene-disease associations, and have been shown to anticipate associations years before they are explicitly stated in the literature. Our text-mining system recognizes over 640,000 biomedical concepts: some are specific (i.e., names of genes or proteins) others generic (e.g., 'Homo sapiens'). Generic concepts may play important roles in automated information retrieval, extraction, and inference but may also result in concept overload and confound retrieval and reasoning with low-relevance or even spurious links. Here, we attempted to optimize the retrieval performance for protein-protein interactions (PPI) by filtering generic concepts (node filtering) or links to generic concepts (edge filtering) from a weighted semantic network. First, we defined metrics based on network properties that quantify the specificity of concepts. Then using these metrics, we systematically filtered generic information from the network while monitoring retrieval performance of known protein-protein interactions. We also systematically filtered specific information from the network (inverse filtering), and assessed the retrieval performance of networks composed of generic information alone. RESULTS: Filtering generic or specific information induced a two-phase response in retrieval performance: initially the effects of filtering were minimal but beyond a critical threshold network performance suddenly drops. Contrary to expectations, networks composed exclusively of generic information demonstrated retrieval performance comparable to unfiltered networks that also contain specific concepts. Furthermore, an analysis using individual generic concepts demonstrated that they can effectively support the retrieval of known protein-protein interactions. For instance the concept "binding" is indicative for PPI retrieval and the concept "mutation abnormality" is indicative for gene-disease associations. CONCLUSION: Generic concepts are important for information retrieval and cannot be removed from semantic networks without negative impact on retrieval performance.


Assuntos
Mineração de Dados/métodos , Semântica , Vocabulário Controlado , Humanos
11.
Genes (Basel) ; 2(3): 608-26, 2011 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-24710212

RESUMO

Biological proteins are known to fold into specific 3D conformations. However, the fundamental question has remained: Do they fold because they are biological, and evolution has selected sequences which fold? Or is folding a common trait, widespread throughout sequence space? To address this question arbitrary, unevolved, random-sequence proteins were examined for structural features found in folded, biological proteins. Libraries of long (71 residue), random-sequence polypeptides, with ensemble amino acid composition near the mean for natural globular proteins, were expressed as cleavable fusions with ubiquitin. The structural properties of both the purified pools and individual isolates were then probed using circular dichroism, fluorescence emission, and fluorescence quenching techniques. Despite this necessarily sparse "sampling" of sequence space, structural properties that define globular biological proteins, namely collapsed conformations, secondary structure, and cooperative unfolding, were found to be prevalent among unevolved sequences. Thus, for polypeptides the size of small proteins, natural selection is not necessary to account for the compact and cooperative folded states observed in nature.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA