RESUMO
Neuropeptides represent a unique class of signaling molecules that have garnered much attention but require special consideration when identifications are gleaned from mass spectra. With highly variable sequence lengths, neuropeptides must be analyzed in their endogenous state. Further, neuropeptides share great homology within families, differing by as little as a single amino acid residue, complicating even routine analyses and necessitating optimized computational strategies for confident and accurate identifications. We present EndoGenius, a database searching strategy designed specifically for elucidating neuropeptide identifications from mass spectra by leveraging optimized peptide-spectrum matching approaches, an expansive motif database, and a novel scoring algorithm to achieve broader representation of the neuropeptidome and minimize reidentification. This work describes an algorithm capable of reporting more neuropeptide identifications at 1% false-discovery rate than alternative software in five Callinectes sapidus neuronal tissue types.
Assuntos
Algoritmos , Bases de Dados de Proteínas , Neuropeptídeos , Software , Neuropeptídeos/análise , Neuropeptídeos/química , Animais , Espectrometria de Massas/métodos , Sequência de Aminoácidos , Proteômica/métodos , Espectrometria de Massas em Tandem/métodosRESUMO
Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have reanalyzed publicly available mass spectrometry proteomics data sets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,565 phosphosites on serine, threonine, and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety and clustered the data to identify groups of sites with similar patterns across rice family groups. The data has been loaded into UniProt Knowledge-Baseâenabling researchers to visualize sites alongside other data on rice proteins, e.g., structural models from AlphaFold2, PeptideAtlas, and the PRIDE databaseâenabling visualization of source evidence, including scores and supporting mass spectra.
Assuntos
Genoma de Planta , Oryza , Fosfoproteínas , Proteínas de Plantas , Proteômica , Transdução de Sinais , Oryza/genética , Oryza/metabolismo , Oryza/química , Proteômica/métodos , Fosfoproteínas/metabolismo , Fosfoproteínas/genética , Fosfoproteínas/química , Fosfoproteínas/análise , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Fosforilação , Processamento de Proteína Pós-Traducional , Fosfopeptídeos/metabolismo , Fosfopeptídeos/análise , Bases de Dados de Proteínas , Motivos de Aminoácidos , Espectrometria de MassasRESUMO
In this editorial, Anthea Sutton and Veronica Parisi reflect on ChatGPT, how it may contribute to systematic searching, and provide their overview of some recent training they attended on ChatGPT, AI and systematic literature reviews.
RESUMO
BACKGROUND: Latin American and Caribbean Health Sciences Literature (LILACS) is the main reference database in the region; however, the way in which this resource is used in Cochrane systematic reviews has not been studied. OBJECTIVES: To assess the search methods of Cochrane reviews that used LILACS as a source of information and explore the Cochrane community's perceptions about this resource. METHODS: We identified all Cochrane reviews of interventions published during 2019, which included LILACS as a source of information, and analysed their search methods and also ran a survey through the Cochrane Community. RESULTS: We found 133 Cochrane reviews that reported the full search strategies, identifying heterogeneity in search details. The respondents to our survey highlighted many areas for improvement in the use of LILACS, including the usability of the search platform for this purpose. DISCUSSION: The use and reporting of LILACS in Cochrane reviews demonstrate inconsistencies, as evidenced by the analysis of search reports from systematic reviews and surveys conducted among members of the Cochrane community. CONCLUSION: With better guidance on how LILACS database is structured, information specialists working on Cochrane reviews should be able to make more effective use of this unique resource.
Assuntos
Serviços de Informação , Medicina , Humanos , Publicações , Inquéritos e QuestionáriosRESUMO
BACKGROUND: Cross-linking mass spectrometry (XL-MS) is a powerful technique for detecting protein-protein interactions (PPIs) and modeling protein structures in a high-throughput manner. In XL-MS experiments, proteins are cross-linked by a chemical reagent (namely cross-linker), fragmented, and then fed into a tandem mass spectrum (MS/MS). Cross-linkers are either cleavable or non-cleavable, and each type requires distinct data analysis tools. However, both types of cross-linkers suffer from imbalanced fragmentation efficiency, resulting in a large number of unidentifiable spectra that hinder the discovery of PPIs and protein conformations. To address this challenge, researchers have sought to improve the sensitivity of XL-MS through invention of novel cross-linking reagents, optimization of sample preparation protocols, and development of data analysis algorithms. One promising approach to developing new data analysis methods is to apply a protein feedback mechanism in the analysis. It has significantly improved the sensitivity of analysis methods in the cleavable cross-linking data. The application of the protein feedback mechanism to the analysis of non-cleavable cross-linking data is expected to have an even greater impact because the majority of XL-MS experiments currently employs non-cleavable cross-linkers. RESULTS: In this study, we applied the protein feedback mechanism to the analysis of both non-cleavable and cleavable cross-linking data and observed a substantial improvement in cross-link spectrum matches (CSMs) compared to conventional methods. Furthermore, we developed a new software program, ECL 3.0, that integrates two algorithms and includes a user-friendly graphical interface to facilitate wider applications of this new program. CONCLUSIONS: ECL 3.0 source code is available at https://github.com/yuweichuan/ECL-PF.git . A quick tutorial is available at https://youtu.be/PpZgbi8V2xI .
Assuntos
Peptídeos , Espectrometria de Massas em Tandem , Algoritmos , Reagentes de Ligações Cruzadas , Análise de DadosRESUMO
Spectrum library searching is a powerful alternative to database searching for data dependent acquisition experiments, but has been historically limited to identifying previously observed peptides in libraries. Here we present Scribe, a new library search engine designed to leverage deep learning fragmentation prediction software such as Prosit. Rather than relying on highly curated DDA libraries, this approach predicts fragmentation and retention times for every peptide in a FASTA database. Scribe embeds Percolator for false discovery rate correction and an interference tolerant, label-free quantification integrator for an end-to-end proteomics workflow. By leveraging expected relative fragmentation and retention time values, we find that library searching with Scribe can outperform traditional database searching tools both in terms of sensitivity and quantitative precision. Scribe and its graphical interface are easy to use, freely accessible, and fully open source.
Assuntos
Peptídeos , Espectrometria de Massas em Tandem , Software , Proteômica , Ferramenta de Busca , Biblioteca de Peptídeos , Bases de Dados de ProteínasRESUMO
Protein database search engines are an integral component of mass spectrometry-based peptidomic analyses. Given the unique computational challenges of peptidomics, many factors must be taken into consideration when optimizing search engine selection, as each platform has different algorithms by which tandem mass spectra are scored for subsequent peptide identifications. In this study, four different database search engines, PEAKS, MS-GF+, OMSSA, and X! Tandem, were compared with Aplysia californica and Rattus norvegicus peptidomics data sets, and various metrics were assessed such as the number of unique peptide and neuropeptide identifications, and peptide length distributions. Given the tested conditions, PEAKS was found to have the highest number of peptide and neuropeptide identifications out of the four search engines in both data sets. Furthermore, principal component analysis and multivariate logistic regression were employed to determine whether specific spectral features contribute to false C-terminal amidation assignments by each search engine. From this analysis, it was found that the primary features influencing incorrect peptide assignments were the precursor and fragment ion m/z errors. Finally, an assessment employing a mixed species protein database was performed to evaluate search engine precision and sensitivity when searched against an enlarged search space containing human proteins.
Assuntos
Neuropeptídeos , Ferramenta de Busca , Humanos , Animais , Ratos , Peptídeos , Algoritmos , Espectrometria de Massas em Tandem , Bases de Dados de Proteínas , SoftwareRESUMO
Stochastic, intensity-based precursor isolation can result in isotopically enriched fragment ions. This problem is exacerbated for large peptides and stable isotope labeling experiments using deuterium or 15N. For stable isotope labeling experiments, incomplete and ubiquitous labeling strategies result in the isolation of peptide ions composed of many distinct structural isomers. Unfortunately, existing proteomics search algorithms do not account for this variability in isotopic incorporation, and thus often yield poor peptide and protein identification rates. We sought to resolve this shortcoming by deriving the expected isotopic distributions of each fragment ion and incorporating them into the theoretical mass spectra used for peptide-spectrum-matching. We adapted the Comet search platform to integrate a modified spectral prediction algorithm we term Conditional fragment Ion Distribution Search (CIDS). Comet-CIDS uses a traditional database searching strategy, but for each candidate peptide we compute the isotopic distribution of each fragment to better match the observed m/z distributions. Evaluating previously generated D2O and 15N labeled data sets, we found that Comet-CIDS identified more confident peptide spectral matches and higher protein sequence coverage compared to traditional theoretical spectra generation, with the magnitude of improvement largely determined by the amount of labeling in the sample.
Assuntos
Peptídeos , Proteínas , Peptídeos/química , Proteínas/metabolismo , Sequência de Aminoácidos , Probabilidade , ÍonsRESUMO
Single-cell proteomics is emerging as an important subfield in the proteomics and mass spectrometry communities, with potential to reshape our understanding of cell development, cell differentiation, disease diagnosis, and the development of new therapies. Compared with significant advancements in the "hardware" that is used in single-cell proteomics, there has been little work comparing the effects of using different "software" packages to analyze single-cell proteomics datasets. To this end, seven popular proteomics programs were compared here, applying them to search three single-cell proteomics datasets generated by three different platforms. The results suggest that MSGF+, MSFragger, and Proteome Discoverer are generally more efficient in maximizing protein identifications, that MaxQuant is better suited for the identification of low-abundance proteins, that MSFragger is superior in elucidating peptide modifications, and that Mascot and X!Tandem are better for analyzing long peptides. Furthermore, an experiment with different loading amounts was carried out to investigate changes in identification results and to explore areas in which single-cell proteomics data analysis may be improved in the future. We propose that this comparative study may provide insight for experts and beginners alike operating in the emerging subfield of single-cell proteomics.
Assuntos
Proteômica , Espectrometria de Massas em Tandem , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Ferramenta de Busca/métodos , Software , Proteoma/análise , Bases de Dados de ProteínasRESUMO
O-GlcNAcylation, the addition of a single N-acetylglucosamine residue to serine and threonine residues of cytoplasmic, nuclear, or mitochondrial proteins, is a widespread regulatory posttranslational modification. It is involved in the response to nutritional status and stress, and its dysregulation is associated with diseases ranging from Alzheimer's to diabetes. Although the modification was first detected over 35 years ago, research into the function of O-GlcNAcylation has accelerated dramatically in the last 10 years owing to the development of new enrichment and mass spectrometry techniques that facilitate its analysis. This article summarizes methods for O-GlcNAc enrichment, key mass spectrometry instrumentation advancements, particularly those that allow modification site localization, and software tools that allow analysis of data from O-GlcNAc-modified peptides.
Assuntos
Acetilglucosamina/metabolismo , Acetilglucosamina/química , Animais , Humanos , Imunoprecipitação , Lectinas/química , Espectrometria de Massas , Processamento de Proteína Pós-Traducional , SoftwareRESUMO
This study examines the frequency of misspellings in health sciences literature and explores how they affect citation retrieval in multiple databases. Searches for commonly misspelled medical words were conducted in PubMed, CINAHL Complete, APA PsycArticles (ProQuest), APA PsycInfo, and ProQuest Psychology databases. Citations that would be retrieved using a word's correct spelling were removed from the search results. Remaining results were citations that could only be retrieved if the word was misspelled in the search. Articles with clinical significance were targeted. The top five most commonly misspelled words were occurrence, ophthalmology, pruritus, sagittal, and resistance. Ophthalmology had the highest number of citations that contained at least one misspelling, with 57% of those citations "missing" when searched with the correct spelling of the word. The word with the highest percentage (82%) of missed citations was arrhythmia. The results of this study indicate that misspellings in scholarly literature are more prevalent than searchers might realize. The ability to retrieve citations is adversely affected by misspellings, which has the potential to affect patient care. Many opportunities exist in the editorial process to identify and correct misspellings before publication. This is less so once a journal is published. The implications for database searching and manuscript evaluation are discussed.
Assuntos
Medicina , Humanos , PubMed , Bases de Dados FactuaisRESUMO
BACKGROUND: Traditional and complementary medicine (T&CM) is highly utilised and draws on traditional knowledge (TK) as evidence, raising a need to explore how TK is currently used. OBJECTIVES: Examine criteria used to select, evaluate and apply TK in contemporary health contexts. METHODS: Systematic search utilising academic databases (AMED, CINAHL, MEDLINE, EMBASE, SSCI, ProQuest Dissertations Theses Global), Trip clinical database and Google search engine. Citations and reference lists of included articles were searched. Reported use of TK in contemporary settings was mapped against a modified 'Exploration-Preparation-Implementation-Sustainment' (EPIS) implementation framework. RESULTS: From the 54 included articles, EPIS mapping found TK is primarily used in the Exploration phase of implementation (n = 54), with little reporting on Preparation (n = 16), Implementation process (n = 6) or Sustainment (n = 4) of TK implementation. Criteria used in selection, evaluation and application of TK commonly involved validation with other scientific/traditional evidence sources, or assessment of factors influencing knowledge translation. DISCUSSION: One of the difficulties in validation of TK (as a co-opted treatment) against other evidence sources is comparing like with like as TK often takes a holistic approach. This complicates further planning and evaluation of implementation. CONCLUSION: This review identifies important criteria for evaluating current and potential contemporary use of TK, identifying gaps in research and practice for finding, appraising and applying relevant TK studies for clinical care.
Assuntos
Educação em Saúde , Conhecimento , Políticas , HumanosRESUMO
BACKGROUND: Medication discontinuation studies explore the outcomes of stopping a medication compared to continuing it. Comprehensively identifying medication discontinuation articles in bibliographic databases remains challenging due to variability in terminology. OBJECTIVES: To develop and validate search filters to retrieve medication discontinuation articles in Medline and Embase. METHODS: We identified medication discontinuation articles in a convenience sample of systematic reviews. We used primary articles to create two reference sets for Medline and Embase, respectively. The reference sets were equally divided by randomization in development sets and validation sets. Terms relevant for discontinuation were identified by term frequency analysis in development sets and combined to develop two search filters that maximized relative recalls. The filters were validated against validation sets. Relative recalls were calculated with their 95% confidences intervals (95% CI). RESULTS: We included 316 articles for Medline and 407 articles for Embase, from 15 systematic reviews. The Medline optimized search filter combined 7 terms. The Embase optimized search filter combined 8 terms. The relative recalls were respectively 92% (95% CI: 87-96) and 91% (95% CI: 86-94). CONCLUSIONS: We developed two search filters for retrieving medication discontinuation articles in Medline and Embase. Further research is needed to estimate precision and specificity of the filters.
RESUMO
Phosphoproteomic methods are commonly employed to identify and quantify phosphorylation sites on proteins. In recent years, various tools have been developed, incorporating scores or statistics related to whether a given phosphosite has been correctly identified or to estimate the global false localization rate (FLR) within a given data set for all sites reported. These scores have generally been calibrated using synthetic datasets, and their statistical reliability on real datasets is largely unknown, potentially leading to studies reporting incorrectly localized phosphosites, due to inadequate statistical control. In this work, we develop the concept of scoring modifications on a decoy amino acid, that is, one that cannot be modified, to allow for independent estimation of global FLR. We test a variety of amino acids, on both synthetic and real data sets, demonstrating that the selection can make a substantial difference to the estimated global FLR. We conclude that while several different amino acids might be appropriate, the most reliable FLR results were achieved using alanine and leucine as decoys. We propose the use of a decoy amino acid to control false reporting in the literature and in public databases that re-distribute the data. Data are available via ProteomeXchange with identifier PXD028840.
Assuntos
Aminoácidos , Espectrometria de Massas em Tandem , Bases de Dados de Proteínas , Reprodutibilidade dos Testes , Espectrometria de Massas em Tandem/métodosRESUMO
BACKGROUND: Small databases, such as Health Management Information Consortium (HMIC) and Social Policy and Practice (SPP), can add value to systematic searches. Search strategies designed for large databases may not be appropriate in small sources. A different approach to translating strategies could ensure that small databases are searched efficiently. OBJECTIVES: To establish the contribution HMIC and SPP made to public health guidelines (PHGs); and to recommend an efficient method of translating search strategies. METHODS: Eight PHGs were analysed to establish how many included publications were retrieved from HMIC and SPP. Six options for translating strategies from MEDLINE, using variations of free text and subject terms, were compared. RESULTS: Health Management Information Consortium contributed 15 and SPP eight of the 483 publications cited in the PHGs. The free-text only search was the one option to miss an included publication. The heading word (with truncation) option was more precise than applying subject headings. DISCUSSION: There is a risk of missing relevant publications in free-text only searches and it is preferable to include subject terms efficiently. CONCLUSION: The heading word (with truncation) option did not miss the evidence included in the PHGs and was the most efficient method for translating MEDLINE to HMIC and SPP.
Assuntos
Armazenamento e Recuperação da Informação , Descritores , Dacarbazina/análogos & derivados , Bases de Dados Bibliográficas , Humanos , MEDLINE , Política PúblicaRESUMO
BACKGROUND: Information specialists conducting searches for systematic reviews need to consider key questions around which and how many sources to search. This is particularly important for public health topics where evidence may be found in diverse sources. OBJECTIVES: The objective of this review is to give an overview of recent studies on information retrieval guidance and methods that could be applied to public health evidence and used to guide future searches. METHODS: A literature search was performed in core databases and supplemented by browsing health information journals and citation searching. Results were sifted and reviewed. RESULTS: Seventy-two papers were found and grouped into themes covering sources and search techniques. Public health topics were poorly covered in this literature. DISCUSSION: Many researchers follow the recommendations to search multiple databases. The review topic influences decisions about sources. Additional sources covering grey literature eliminate bias but are time-consuming and difficult to search systematically. Public health searching is complex, often requiring searches in multidisciplinary sources and using additional methods. CONCLUSIONS: Search planning is advisable to enable decisions about which and how many sources to search. This could improve with more work on modelling search scenarios, particularly in public health topics, to examine where publications were found and guide future research.
Assuntos
Armazenamento e Recuperação da Informação , Saúde Pública , Bases de Dados Bibliográficas , Bases de Dados Factuais , Humanos , Revisões Sistemáticas como AssuntoRESUMO
BACKGROUND: Supplementary search methods, including citation searching, are essential if systematic reviews are to avoid producing biased conclusions. Little evidence exists on how to prioritise databases for citation searching or to establish whether using multiple sources is beneficial. OBJECTIVES: A systematic review examining urgent and emergency care reconfiguration was used to investigate the utility of citation searching on Web of Science (WOS) and/or Google Scholar (GS). METHODS: This case study investigated numbers of studies, additional studies and unique studies retrieved from both sources. In addition, the time to search, the ease of adding references to reference management software and obtaining abstracts of studies for screening are briefly considered. RESULTS: WOS retrieved 62 references after deduplication of the results, 52 being additional references not retrieved during the database searching. GS retrieved 134 unique references with 63 additional references. WOS and GS retrieved the same three additional included studies. WOS was less time intensive to search given the facility to restrict to English language papers and availability of abstracts. CONCLUSIONS: In a single systematic review case study, citation searching was required to identify all included studies. Citation searching on WOS is more efficient, where a subscription is available. Both databases identified the same studies but GS required additional time to remove non-English language studies and locate abstracts.
RESUMO
Health science libraries have been using information technology since the late 1960s, shaping both the profession and the mission of these libraries. To explore the impact of technology, a series of articles has been commissioned for the HILJ Regular Feature, International Perspectives and Initiatives. This editorial sets the scene for this series of articles, which starts in this issue. These articles, written by health science librarians from around the globe, will explore the impact of technology on the way health science libraries provide information in the digital age. Some articles will look at national trends and others will focus on a particular library. A key theme is how technology is being used to support the mission of health science libraries and whether technology has altered that mission. This editorial provides a brief overview of the technologies libraries have adopted, from the 1970s to the present day. From this, it is clear that information technology has transformed the way health information is collected, catalogued, and disseminated to users. And it is certain that in the coming decade new technologies will be incorporated into health science libraries, which will pose challenges for both users and librarians. However, librarians will continue to find ways to adapt and use these tools to meet the needs of their users.
Assuntos
Bibliotecários , Bibliotecas Médicas , Biblioteconomia , Humanos , TecnologiaRESUMO
A vast number of human cell lines are available for cell culture model-based studies, and as such the potential exists for discrepancies in findings due to cell line selection. To investigate this concept, the authors determine the relative protein abundance profiles of a panel of eight diverse, but commonly studied human cell lines. This panel includes HAP1, HEK293T, HeLa, HepG2, Jurkat, Panc1, SH-SY5Y, and SVGp12. A mass spectrometry-based proteomics workflow designed to enhance quantitative accuracy while maintaining analytical depth is used. To this end, this strategy leverages TMTpro16-based sample multiplexing, high-field asymmetric ion mobility spectrometry, and real-time database searching. The data show that the differences in the relative protein abundance profiles reflect cell line diversity. The authors also determine several hundred proteins to be highly enriched for a given cell line, and perform gene ontology and pathway analysis on these cell line-enriched proteins. An R Shiny application is designed to query protein abundance profiles and retrieve proteins with similar patterns. The workflows used herein can be applied to additional cell lines to aid cell line selection for addressing a given scientific inquiry or for improving an experimental design.
Assuntos
Espectrometria de Mobilidade Iônica , Proteômica , Linhagem Celular , Bases de Dados Factuais , Células HEK293 , Humanos , ProteínasRESUMO
In this paper, Stamatoula Pylarinou with her supervisor Prof. Sarantos Kapidakis reports on an analysis of bibliographic data of the publications of Greek hospital personnel, conducted as part of Stamatoula's doctoral research in the Department of Archive, Library and Museum Sciences at the Ionian University in Corfu, Greece. Using freely available data, they demonstrate the questions posed and the insights gained from the analysis of the scientific publications of personnel of public hospitals in Greece, in particular amid the years of austerity in Greece. With regard to impact on practice, they suggest that these procedures for the processing of medline/PubMed bibliographic data can improve communication among hospital librarians and administration or patients, adding value to their duties and enhancing the information and services they can provide. F.J.