RESUMO
Biology is generating more data than ever. As a result, there is an ever increasing number of publicly available databases that analyse, integrate and summarize the available data, providing an invaluable resource for the biological community. As this trend continues, there is a pressing need to organize, catalogue and rate these resources, so that the information they contain can be most effectively exploited. MetaBase (MB) (http://MetaDatabase.Org) is a community-curated database containing more than 2000 commonly used biological databases. Each entry is structured using templates and can carry various user comments and annotations. Entries can be searched, listed, browsed or queried. The database was created using the same MediaWiki technology that powers Wikipedia, allowing users to contribute on many different levels. The initial release of MB was derived from the content of the 2007 Nucleic Acids Research (NAR) Database Issue. Since then, approximately 100 databases have been manually collected from the literature, and users have added information for over 240 databases. MB is synchronized annually with the static Molecular Biology Database Collection provided by NAR. To date, there have been 19 significant contributors to the project; each one is listed as an author here to highlight the community aspect of the project.
Assuntos
Biologia , Bases de Dados Factuais , Internet , Integração de SistemasRESUMO
BACKGROUND: The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. RESULTS: Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. CONCLUSION: Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases.The system can be accessed at http://pepbank.mgh.harvard.edu.
Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Classificação , InternetRESUMO
BACKGROUND: Arterial calcification is associated with cardiovascular events; however, mechanisms of calcification in atherosclerosis remain obscure. METHODS AND RESULTS: We tested the hypothesis that inflammation promotes osteogenesis in atherosclerotic plaques using in vivo molecular imaging in apolipoprotein E-/- mice (20 to 30 weeks old, n=35). A bisphosphonate-derivatized near-infrared fluorescent imaging agent (excitation 750 nm) visualized osteogenic activity that was otherwise undetectable by x-ray computed tomography. Flow cytometry validated the target specifically in osteoblast-like cells. A spectrally distinct near-infrared fluorescent nanoparticle (excitation 680 nm) was coinjected to simultaneously image macrophages. Fluorescence reflectance mapping demonstrated an association between osteogenic activity and macrophages in aortas of apolipoprotein E-/- mice (R2=0.93). Intravital dual-channel fluorescence microscopy was used to further monitor osteogenic changes in inflamed carotid arteries at 20 and 30 weeks of age and revealed that macrophage burden and osteogenesis concomitantly increased during plaque progression (P<0.01 and P<0.001, respectively) and decreased after statin treatment (P<0.0001 and P<0.05, respectively). Fluorescence microscopy on cryosections colocalized near-infrared fluorescent osteogenic signals with alkaline phosphatase activity, bone-regulating protein expression, and hydroxyapatite nanocrystals as detected by electron microscopy, whereas von Kossa and alizarin red stains showed no evidence of calcification. Real-time reverse-transcription polymerase chain reaction revealed that macrophage-conditioned media increased alkaline phosphatase mRNA expression in vascular smooth muscle cells. CONCLUSIONS: This serial in vivo study demonstrates the real-time association of macrophage burden with osteogenic activity in early-stage atherosclerosis and offers a cellular-resolution tool to identify preclinical microcalcifications.
Assuntos
Apolipoproteínas E/deficiência , Aterosclerose/fisiopatologia , Inflamação/etiologia , Osteogênese/fisiologia , Animais , Apolipoproteínas E/genética , Aterosclerose/patologia , Calcinose/etiologia , Calcinose/patologia , Modelos Animais de Doenças , Humanos , Processamento de Imagem Assistida por Computador , Inflamação/prevenção & controle , Camundongos , Camundongos KnockoutRESUMO
Nanomaterials with precise biological functions have considerable potential for use in biomedical applications. Here we investigate whether multivalent attachment of small molecules can increase specific binding affinity and reveal new biological properties of such nanomaterials. We describe the parallel synthesis of a library comprising 146 nanoparticles decorated with different synthetic small molecules. Using fluorescent magnetic nanoparticles, we rapidly screened the library against different cell lines and discovered a series of nanoparticles with high specificity for endothelial cells, activated human macrophages or pancreatic cancer cells. Hits from the last-mentioned screen were shown to target pancreatic cancer in vivo. The method and described materials could facilitate development of functional nanomaterials for applications such as differentiating cell lines, detecting distinct cellular states and targeting specific cell types.
Assuntos
Engenharia Biomédica/métodos , Regulação Neoplásica da Expressão Gênica , Nanoestruturas , Nanotecnologia/métodos , Animais , Biotecnologia , Diferenciação Celular , Linhagem Celular Tumoral , Separação Celular , Sistemas de Liberação de Medicamentos , Células Endoteliais/metabolismo , Endotélio Vascular/metabolismo , Citometria de Fluxo , Biblioteca Gênica , Humanos , Macrófagos/metabolismo , Camundongos , Camundongos Nus , Transplante de Neoplasias , Neoplasias Pancreáticas/metabolismo , FenótipoRESUMO
BACKGROUND: Peptides are important molecules with diverse biological functions and biomedical uses. To date, there does not exist a single, searchable archive for peptide sequences or associated biological data. Rather, peptide sequences still have to be mined from abstracts and full-length articles, and/or obtained from the fragmented public sources. DESCRIPTION: We have constructed a new database (PepBank), which at the time of writing contains a total of 19,792 individual peptide entries. The database has a web-based user interface with a simple, Google-like search function, advanced text search, and BLAST and Smith-Waterman search capabilities. The major source of peptide sequence data comes from text mining of MEDLINE abstracts. Another component of the database is the peptide sequence data from public sources (ASPD and UniProt). An additional, smaller part of the database is manually curated from sets of full text articles and text mining results. We show the utility of the database in different examples of affinity ligand discovery. CONCLUSION: We have created and maintain a database of peptide sequences. The database has biological and medical applications, for example, to predict the binding partners of biologically interesting peptides, to develop peptide based therapeutic or diagnostic agents, or to predict molecular targets or binding specificities of peptides resulting from phage display selection. The database is freely available on http://pepbank.mgh.harvard.edu/, and the text mining source code (Peptide::Pubmed) is freely available above as well as on CPAN (http://www.cpan.org/).