Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 13: 211, 2012 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-22913485

RESUMO

BACKGROUND: A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information. RESULTS: We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9% and recall = 70.5%) compared to a popular dictionary based approach (precision = 97.5% and recall = 54.3%) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central's full text articles annotated with scientific names, the precision and recall values are 98.5% and 96.2% respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages. CONCLUSIONS: We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org.


Assuntos
Inteligência Artificial , Classificação , Mineração de Dados , Animais , MEDLINE , PubMed
2.
J Proteome Res ; 8(10): 4732-42, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19725534

RESUMO

Liquid Chromatography Mass Spectrometry (LC-MS) based proteomics is an important tool in detecting changes in peptide/protein abundances in samples potentially leading to the discovery of disease biomarker candidates. We present CLUE-TIPS (Clustering Using Euclidean distance in Tanimoto Inter-Point Space), an approach that compares complex proteomic samples for similarity/dissimilarity analysis. In CLUE-TIPS, an intersample distance feature map is generated from filtered, aligned and binarized raw LC-MS data by applying the Tanimoto distance metric to obtain normalized similarity scores between all sample pairs for each m/z value. We developed clustering and visualization methods for the intersample distance map to analyze various samples for differences at the sample level as well as the individual m/z level. An approach to query for specific m/z values that are associated with similarity/dissimilarity patterns in a set of samples was also briefly described. CLUE-TIPS can also be used as a tool in assessing the quality of LC-MS runs. The presented approach does not rely on tandem mass-spectrometry (MS/MS), isotopic labels or gels and also does not rely on feature extraction methods. CLUE-TIPS suite was applied to LC-MS data obtained from plasma samples collected at various time points and treatment conditions from immunosuppressed mice implanted with MCF-7 human breast cancer cells. The generated raw LC-MS data was used for pattern analysis and similarity/dissimilarity detection. CLUE-TIPS successfully detected the differences/similarities in samples at various time points taken during the progression of tumor, and also recognized differences/similarities in samples representing various treatment conditions.


Assuntos
Cromatografia Líquida/métodos , Análise por Conglomerados , Espectrometria de Massas/métodos , Proteômica/métodos , Algoritmos , Animais , Neoplasias da Mama/metabolismo , Linhagem Celular Tumoral , Camundongos , Transplante de Neoplasias , Proteínas/metabolismo
3.
J Proteome Res ; 7(9): 4199-208, 2008 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-18686985

RESUMO

A novel computational approach, termed Search for Modified Peptides (SeMoP), for the unrestricted discovery and verification of peptide modifications in shotgun proteomic experiments using low resolution ion trap MS/MS spectra is presented. Various peptide modifications, including post-translational modifications, sequence polymorphisms, as well as sample handling-induced changes, can be identified using this approach. SeMoP utilizes a three-step strategy: (1) a standard database search to identify proteins in a sample; (2) an unrestricted search for modifications using a newly developed algorithm; and (3) a second standard database search targeted to specific modifications found using the unrestricted search. This targeted approach provides verification of discovered modifications and, due to increased sensitivity, a general increase in the number of peptides with the specific modification. The feasibility of the overall strategy has been first demonstrated in the analysis of 65 plasma proteins. Various sample handling induced modifications, such as beta-elimination of disulfide bridges and pyrocarbamidomethylation, as well as biologically induced modifications, such as phosphorylation and methylation, have been detected. A subsequent targeted Sequest search has been used to verify selected modifications, and a 4-fold increase in the number of modified peptides was obtained. In a second application, 1367 proteins of a cervical cancer cell line were processed, leading to detection of several novel amino acid substitutions. By conducting the search against a database of peptides derived from proteins with decoy sequences, a false discovery rate of less than 5% for the unrestricted search resulted. SeMoP is shown to be an effective and easily implemented approach for the discovery and verification of peptide modifications.


Assuntos
Cromatografia Líquida/métodos , Peptídeos/química , Espectrometria de Massas em Tandem/métodos , Sequência de Aminoácidos , Proteínas Sanguíneas/química , Linhagem Celular , Humanos , Metilação , Dados de Sequência Molecular , Fosforilação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...