Pesquisa | Portal de Pesquisa da BVS

1.

Machine Learning for Health: Algorithm Auditing & Quality Control.

Oala, Luis; Murchison, Andrew G; Balachandran, Pradeep; Choudhary, Shruti; Fehr, Jana; Leite, Alixandro Werneck; Goldschmidt, Peter G; Johner, Christian; Schörverth, Elora D M; Nakasi, Rose; Meyer, Martin; Cabitza, Federico; Baird, Pat; Prabhu, Carolin; Weicken, Eva; Liu, Xiaoxuan; Wenzel, Markus; Vogler, Steffen; Akogo, Darlington; Alsalamah, Shada; Kazim, Emre; Koshiyama, Adriano; Piechottka, Sven; Macpherson, Sheena; Shadforth, Ian; Geierhofer, Regina; Matek, Christian; Krois, Joachim; Sanguinetti, Bruno; Arentz, Matthew; Bielik, Pavol; Calderon-Ramirez, Saul; Abbood, Auss; Langer, Nicolas; Haufe, Stefan; Kherif, Ferath; Pujari, Sameer; Samek, Wojciech; Wiegand, Thomas.

J Med Syst ; 45(12): 105, 2021 Nov 02.

Artigo em Inglês | MEDLINE | ID: mdl-34729675

RESUMO

Developers proposing new machine learning for health (ML4H) tools often pledge to match or even surpass the performance of existing tools, yet the reality is usually more complicated. Reliable deployment of ML4H to the real world is challenging as examples from diabetic retinopathy or Covid-19 screening show. We envision an integrated framework of algorithm auditing and quality control that provides a path towards the effective and reliable application of ML systems in healthcare. In this editorial, we give a summary of ongoing work towards that vision and announce a call for participation to the special issue Machine Learning for Health: Algorithm Auditing & Quality Control in this journal to advance the practice of ML4H auditing.

Assuntos

Algoritmos , Aprendizado de Máquina , Controle de Qualidade , Humanos

2.

Pulmonary Hypertension Detection Non-Invasively at Point-of-Care Using a Machine-Learned Algorithm.

Nemati, Navid; Burton, Timothy; Fathieh, Farhad; Gillins, Horace R; Shadforth, Ian; Ramchandani, Shyam; Bridges, Charles R.

Diagnostics (Basel) ; 14(9)2024 Apr 25.

Artigo em Inglês | MEDLINE | ID: mdl-38732312

RESUMO

Artificial intelligence, particularly machine learning, has gained prominence in medical research due to its potential to develop non-invasive diagnostics. Pulmonary hypertension presents a diagnostic challenge due to its heterogeneous nature and similarity in symptoms to other cardiovascular conditions. Here, we describe the development of a supervised machine learning model using non-invasive signals (orthogonal voltage gradient and photoplethysmographic) and a hand-crafted library of 3298 features. The developed model achieved a sensitivity of 87% and a specificity of 83%, with an overall Area Under the Receiver Operator Characteristic Curve (AUC-ROC) of 0.93. Subgroup analysis showed consistent performance across genders, age groups and classes of PH. Feature importance analysis revealed changes in metrics that measure conduction, repolarization and respiration as significant contributors to the model. The model demonstrates promising performance in identifying pulmonary hypertension, offering potential for early detection and intervention when embedded in a point-of-care diagnostic system.

3.

Development of a Non-Invasive Machine-Learned Point-of-Care Rule-Out Test for Coronary Artery Disease.

Burton, Timothy; Fathieh, Farhad; Nemati, Navid; Gillins, Horace R; Shadforth, Ian P; Ramchandani, Shyam; Bridges, Charles R.

Diagnostics (Basel) ; 14(7)2024 Mar 28.

Artigo em Inglês | MEDLINE | ID: mdl-38611631

RESUMO

The current standard of care for coronary artery disease (CAD) requires an intake of radioactive or contrast enhancement dyes, radiation exposure, and stress and may take days to weeks for referral to gold-standard cardiac catheterization. The CAD diagnostic pathway would greatly benefit from a test to assess for CAD that enables the physician to rule it out at the point of care, thereby enabling the exploration of other diagnoses more rapidly. We sought to develop a test using machine learning to assess for CAD with a rule-out profile, using an easy-to-acquire signal (without stress/radiation) at the point of care. Given the historic disparate outcomes between sexes and urban/rural geographies in cardiology, we targeted equal performance across sexes in a geographically accessible test. Noninvasive photoplethysmogram and orthogonal voltage gradient signals were simultaneously acquired in a representative clinical population of subjects before invasive catheterization for those with CAD (gold-standard for the confirmation of CAD) and coronary computed tomographic angiography for those without CAD (excellent negative predictive value). Features were measured from the signal and used in machine learning to predict CAD status. The machine-learned algorithm achieved a sensitivity of 90% and specificity of 59%. The rule-out profile was maintained across both sexes, as well as all other relevant subgroups. A test to assess for CAD using machine learning on a noninvasive signal has been successfully developed, showing high performance and rule-out ability. Confirmation of the performance on a large clinical, blinded, enrollment-gated dataset is required before implementation of the test in clinical practice.

4.

Clinical Validation of a Machine-Learned, Point-of-Care System to IDENTIFY Functionally Significant Coronary Artery Disease.

Stuckey, Thomas D; Meine, Frederick J; McMinn, Thomas R; Depta, Jeremiah P; Bennett, Brett A; McGarry, Thomas F; Carroll, William S; Suh, David D; Steuter, John A; Roberts, Michael C; Gillins, Horace R; Fathieh, Farhad; Burton, Timothy; Nemati, Navid; Shadforth, Ian P; Ramchandani, Shyam; Bridges, Charles R; Rabbat, Mark G.

Diagnostics (Basel) ; 14(10)2024 May 08.

Artigo em Inglês | MEDLINE | ID: mdl-38786284

RESUMO

Many clinical studies have shown wide performance variation in tests to identify coronary artery disease (CAD). Coronary computed tomography angiography (CCTA) has been identified as an effective rule-out test but is not widely available in the USA, particularly so in rural areas. Patients in rural areas are underserved in the healthcare system as compared to urban areas, rendering it a priority population to target with highly accessible diagnostics. We previously developed a machine-learned algorithm to identify the presence of CAD (defined by functional significance) in patients with symptoms without the use of radiation or stress. The algorithm requires 215 s temporally synchronized photoplethysmographic and orthogonal voltage gradient signals acquired at rest. The purpose of the present work is to validate the performance of the algorithm in a frozen state (i.e., no retraining) in a large, blinded dataset from the IDENTIFY trial. IDENTIFY is a multicenter, selectively blinded, non-randomized, prospective, repository study to acquire signals with paired metadata from subjects with symptoms indicative of CAD within seven days prior to either left heart catheterization or CCTA. The algorithm's sensitivity and specificity were validated using a set of unseen patient signals (n = 1816). Pre-specified endpoints were chosen to demonstrate a rule-out performance comparable to CCTA. The ROC-AUC in the validation set was 0.80 (95% CI: 0.78-0.82). This performance was maintained in both male and female subgroups. At the pre-specified cut point, the sensitivity was 0.85 (95% CI: 0.82-0.88), and the specificity was 0.58 (95% CI: 0.54-0.62), passing the pre-specified endpoints. Assuming a 4% disease prevalence, the NPV was 0.99. Algorithm performance is comparable to tertiary center testing using CCTA. Selection of a suitable cut-point results in the same sensitivity and specificity performance in females as in males. Therefore, a medical device embedding this algorithm may address an unmet need for a non-invasive, front-line point-of-care test for CAD (without any radiation or stress), thus offering significant benefits to the patient, physician, and healthcare system.

5.

Development and validation of a machine learned algorithm to IDENTIFY functionally significant coronary artery disease.

Stuckey, Thomas; Meine, Frederick; McMinn, Thomas; Depta, Jeremiah P; Bennett, Brett; McGarry, Thomas; Carroll, William; Suh, David; Steuter, John A; Roberts, Michael; Gillins, Horace R; Lange, Emmanuel; Fathieh, Farhad; Burton, Timothy; Khosousi, Ali; Shadforth, Ian; Sanders, William E; Rabbat, Mark G.

Front Cardiovasc Med ; 9: 956147, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36119746

RESUMO

Introduction: Multiple trials have demonstrated broad performance ranges for tests attempting to detect coronary artery disease. The most common test, SPECT, requires capital-intensive equipment, the use of radionuclides, induction of stress, and time off work and/or travel. Presented here are the development and clinical validation of an office-based machine learned algorithm to identify functionally significant coronary artery disease without radiation, expensive equipment or induced patient stress. Materials and methods: The IDENTIFY trial (NCT03864081) is a prospective, multicenter, non-randomized, selectively blinded, repository study to collect acquired signals paired with subject meta-data, including outcomes, from subjects with symptoms of coronary artery disease. Time synchronized orthogonal voltage gradient and photoplethysmographic signals were collected for 230 seconds from recumbent subjects at rest within seven days of either left heart catheterization or coronary computed tomography angiography. Following machine learning on a proportion of these data (N = 2,522), a final algorithm was selected, along with a pre-specified cut point on the receiver operating characteristic curve for clinical validation. An unseen set of subject signals (N = 965) was used to validate the algorithm. Results: At the pre-specified cut point, the sensitivity for detecting functionally significant coronary artery disease was 0.73 (95% CI: 0.68-0.78), and the specificity was 0.68 (0.62-0.74). There exists a point on the receiver operating characteristic curve at which the negative predictive value is the same as coronary computed tomographic angiography, 0.99, assuming a disease incidence of 0.04, yielding sensitivity of 0.89 and specificity of 0.42. Selecting a point at which the positive predictive value is maximized, 0.12, yields sensitivity of 0.39 and specificity of 0.88. Conclusion: The performance of the machine learned algorithm presented here is comparable to common tertiary center testing for coronary artery disease. Employing multiple cut points on the receiver operating characteristic curve can yield the negative predictive value of coronary computed tomographic angiography and a positive predictive value approaching that of myocardial perfusion imaging. As such, a system employing this algorithm may address the need for a non-invasive, no radiation, no stress, front line test, and hence offer significant advantages to the patient, their physician, and healthcare system.

6.

Multicenter validation of a machine learning phase space electro-mechanical pulse wave analysis to predict elevated left ventricular end diastolic pressure at the point-of-care.

Bhavnani, Sanjeev P; Khedraki, Rola; Cohoon, Travis J; Meine, Frederick J; Stuckey, Thomas D; McMinn, Thomas; Depta, Jeremiah P; Bennett, Brett; McGarry, Thomas; Carroll, William; Suh, David; Steuter, John A; Roberts, Michael; Gillins, Horace R; Shadforth, Ian; Lange, Emmanuel; Doomra, Abhinav; Firouzi, Mohammad; Fathieh, Farhad; Burton, Timothy; Khosousi, Ali; Ramchandani, Shyam; Sanders, William E; Smart, Frank.

PLoS One ; 17(11): e0277300, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36378672

RESUMO

BACKGROUND: Phase space is a mechanical systems approach and large-scale data representation of an object in 3-dimensional space. Whether such techniques can be applied to predict left ventricular pressures non-invasively and at the point-of-care is unknown. OBJECTIVE: This study prospectively validated a phase space machine-learned approach based on a novel electro-mechanical pulse wave method of data collection through orthogonal voltage gradient (OVG) and photoplethysmography (PPG) for the prediction of elevated left ventricular end diastolic pressure (LVEDP). METHODS: Consecutive outpatients across 15 US-based healthcare centers with symptoms suggestive of coronary artery disease were enrolled at the time of elective cardiac catheterization and underwent OVG and PPG data acquisition immediately prior to angiography with signals paired with LVEDP (IDENTIFY; NCT #03864081). The primary objective was to validate a ML algorithm for prediction of elevated LVEDP using a definition of ≥25 mmHg (study cohort) and normal LVEDP ≤ 12 mmHg (control cohort), using AUC as the measure of diagnostic accuracy. Secondary objectives included performance of the ML predictor in a propensity matched cohort (age and gender) and performance for an elevated LVEDP across a spectrum of comparative LVEDP (<12 through 24 at 1 mmHg increments). Features were extracted from the OVG and PPG datasets and were analyzed using machine-learning approaches. RESULTS: The study cohort consisted of 684 subjects stratified into three LVEDP categories, ≤12 mmHg (N = 258), LVEDP 13-24 mmHg (N = 347), and LVEDP ≥25 mmHg (N = 79). Testing of the ML predictor demonstrated an AUC of 0.81 (95% CI 0.76-0.86) for the prediction of an elevated LVEDP with a sensitivity of 82% and specificity of 68%, respectively. Among a propensity matched cohort (N = 79) the ML predictor demonstrated a similar result AUC 0.79 (95% CI: 0.72-0.8). Using a constant definition of elevated LVEDP and varying the lower threshold across LVEDP the ML predictor demonstrated and AUC ranging from 0.79-0.82. CONCLUSION: The phase space ML analysis provides a robust prediction for an elevated LVEDP at the point-of-care. These data suggest a potential role for an OVG and PPG derived electro-mechanical pulse wave strategy to determine if LVEDP is elevated in patients with symptoms suggestive of cardiac disease.

Assuntos

Disfunção Ventricular Esquerda , Humanos , Disfunção Ventricular Esquerda/diagnóstico , Pressão Sanguínea , Sistemas Automatizados de Assistência Junto ao Leito , Análise de Onda de Pulso , Aprendizado de Máquina , Função Ventricular Esquerda , Pressão Ventricular , Volume Sistólico

7.

Machine Learning Algorithms: Selection of Appropriate Validation Populations for Cardiology Research-Be Careful!

Sanders, William E; Khedraki, Rola; Rabbat, Mark; McMinn, Thomas R; Burton, Tim; Khosousi, Ali; Fathieh, Farhad; Gillins, Horace R; Ramchandani, Shyam; Shadforth, Ian P.

JACC Adv ; 2(1): 100166, 2023 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38939020

8.

Cardiac Phase Space Tomography: A novel method of assessing coronary artery disease utilizing machine learning.

Stuckey, Thomas D; Gammon, Roger S; Goswami, Robi; Depta, Jeremiah P; Steuter, John A; Meine, Frederick J; Roberts, Michael C; Singh, Narendra; Ramchandani, Shyam; Burton, Tim; Grouchy, Paul; Khosousi, Ali; Shadforth, Ian; Sanders, William E.

PLoS One ; 13(8): e0198603, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30089110

RESUMO

BACKGROUND: Artificial intelligence (AI) techniques are increasingly applied to cardiovascular (CV) medicine in arenas ranging from genomics to cardiac imaging analysis. Cardiac Phase Space Tomography Analysis (cPSTA), employing machine-learned linear models from an elastic net method optimized by a genetic algorithm, analyzes thoracic phase signals to identify unique mathematical and tomographic features associated with the presence of flow-limiting coronary artery disease (CAD). This novel approach does not require radiation, contrast media, exercise, or pharmacological stress. The objective of this trial was to determine the diagnostic performance of cPSTA in assessing CAD in patients presenting with chest pain who had been referred by their physician for coronary angiography. METHODS: This prospective, multicenter, non-significant risk study was designed to: 1) develop machine-learned algorithms to assess the presence of CAD (defined as one or more ≥ 70% stenosis, or fractional flow reserve ≤ 0.80) and 2) test the accuracy of these algorithms prospectively in a naïve verification cohort. This report is an analysis of phase signals acquired from 606 subjects at rest just prior to angiography. From the collective phase signal data, features were extracted and paired with the known angiographic results. A development set, consisting of signals from 512 subjects, was used for machine learning to determine an algorithm that correlated with significant CAD. Verification testing of the algorithm was performed utilizing previously untested phase signals from 94 subjects. RESULTS: The machine-learned algorithm had a sensitivity of 92% (95% CI: 74%-100%) and specificity of 62% (95% CI: 51%-74%) on blind testing in the verification cohort. The negative predictive value (NPV) was 96% (95% CI: 85%-100%). CONCLUSIONS: These initial multicenter results suggest that resting cPSTA may have comparable diagnostic utility to functional tests currently used to assess CAD without requiring cardiac stress (exercise or pharmacological) or exposure of the patient to radioactivity.

Assuntos

Algoritmos , Doença da Artéria Coronariana/diagnóstico , Técnicas de Diagnóstico Cardiovascular , Aprendizado de Máquina , Idoso , Angiografia Coronária , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Tomografia Computadorizada Multidetectores/métodos , Valor Preditivo dos Testes , Sensibilidade e Especificidade

9.

Genome annotating proteomics pipelines: available tools.

Shadforth, Ian; Bessant, Conrad.

Expert Rev Proteomics ; 3(6): 621-9, 2006 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-17181476

RESUMO

Proteomics based on tandem mass spectrometry is a powerful tool for identifying novel biomarkers and drug targets. Previously, a major bottleneck in high-throughput proteomics has been that the computational techniques needed to reliably identify proteins from proteomic data lagged behind the ability to collect the immense quantity of data generated. This is no longer the case, as fully automated pipelines for peptide and protein identification exist, and these are publicly and privately accessible. Such pipelines can automatically and rapidly generate high-confidence protein identifications from large datasets in a searchable format covering multiple experimental runs. However, the main challenge for the community now is to use these resources as they are, by taking full advantage of the pooling of information, so that the next barrier in our understanding of biology may be broken. There are currently two pipelines in the public domain that provide such potential: PeptideAtlas and the Genome Annotating Proteomic Pipeline. This review will introduce their features in the context of high-throughput proteomics, and provide indicative results as to their usefulness and usability through a side-by-side comparison of results obtained when processing a set of human plasma samples.

Assuntos

Genoma/genética , Proteínas/genética , Proteínas/metabolismo , Proteômica/métodos , Animais , Bases de Dados Genéticas , Humanos , Espectrometria de Massas , Proteínas/química

10.

i-Tracker: for quantitative proteomics using iTRAQ.

Shadforth, Ian P; Dunkley, Tom P J; Lilley, Kathryn S; Bessant, Conrad.

BMC Genomics ; 6: 145, 2005 Oct 20.

Artigo em Inglês | MEDLINE | ID: mdl-16242023

RESUMO

BACKGROUND: iTRAQ technology for protein quantitation using mass spectrometry is a recent, powerful means of determining relative protein levels in up to four samples simultaneously. Although protein identification of samples generated using iTRAQ may be carried out using any current identification software, the quantitation calculations have been restricted to the ProQuant software supplied by Applied Biosciences. i-Tracker software has been developed to extract reporter ion peak ratios from non-centroided tandem MS peak lists in a format easily linked to the results of protein identification tools such as Mascot and Sequest. Such functionality is currently not provided by ProQuant, which is restricted to matching quantitative information to the peptide identifications from Applied Biosciences' Interrogator software. RESULTS: i-Tracker is shown to generate results that are consistent with those produced by ProQuant, thus validating both systems. CONCLUSION: i-Tracker allows quantitative information gained using the iTRAQ protocol to be linked with peptide identifications from popular tandem MS identification tools and hence is both a timely and useful tool for the proteomics community.

Assuntos

Espectrometria de Massas/métodos , Proteômica/métodos , Algoritmos , Genes Reporter , Íons , Modelos Estatísticos , Peptídeos/química , Probabilidade , Proteínas/química , Reprodutibilidade dos Testes , Software

11.

Public proteomic MS repositories and pipelines: available tools and biological applications.

Mead, Jennifer A; Shadforth, Ian P; Bessant, Conrad.

Proteomics ; 7(16): 2769-86, 2007 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-17654461

RESUMO

As proteomic MS has increased in throughput, so has the demand to catalogue the increasing number of peptides and proteins observed by the community using this technique. As in other 'omics' fields, this brings obvious scientific benefits such as sharing of results and prevention of unnecessary repetition, but also provides technical insights, such as the ability to compare proteome coverage between different laboratories, or between different proteomic platforms. Journals are also moving towards mandating that proteomics data be submitted to public repositories upon publication. In response to these demands, several web-based repositories have been established to store protein and peptide identifications derived from MS data, and a similar number of peptide identification software pipelines have emerged to deliver identifications to these repositories. This paper reviews the latest developments in public domain peptide and protein identification databases and describes the analysis pipelines that feed them. Recent applications of the tools to pertinent biological problems are examined, and through comparing and contrasting the capabilities of each system, the issues facing research users of web-based repositories are explored. Future developments and mechanisms to enhance system functionality and user-interfacing opportunities are also suggested.

Assuntos

Proteômica , Biologia Computacional

12.

GAPP: a fully automated software for the confident identification of human peptides from tandem mass spectra.

Shadforth, Ian; Xu, Weibing; Crowther, Daniel; Bessant, Conrad.

J Proteome Res ; 5(10): 2849-52, 2006 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-17022656

RESUMO

This paper introduces the genome annotating proteomic pipeline (GAPP), a totally automated publicly available software pipeline for the identification of peptides and proteins from human proteomic tandem mass spectrometry data. The pipeline takes as its input a series of MS/MS peak lists from a given experimental sample and produces a series of database entries corresponding to the peptides observed within the sample, along with related confidence scores. The pipeline is capable of finding any peptides expected, including those that cross intron-exon boundaries, and those due to single nucleotide polymorphisms (SNPs), alternate splicing, and post-translational modifications (PTMs). GAPP can therefore be used to re-annotate genomes, and this is supported through the inclusion of a Distributed Annotation System (DAS) server, which allows the peptides identified by the pipeline to be displayed in their genomic context within the Ensembl genome browser. GAPP is freely available via the web, at www. gapp.info.

Assuntos

Genoma Humano , Genômica/métodos , Peptídeos/análise , Proteômica/métodos , Software , Processamento Alternativo , Humanos , Espectrometria de Massas , Peptídeos/genética , Polimorfismo de Nucleotídeo Único , Processamento de Proteína Pós-Traducional

13.

Quantitative proteomic approach to study subcellular localization of membrane proteins.

Sadowski, Pawel G; Dunkley, Tom P J; Shadforth, Ian P; Dupree, Paul; Bessant, Conrad; Griffin, Julian L; Lilley, Kathryn S.

Nat Protoc ; 1(4): 1778-89, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-17487160

RESUMO

As proteins within cells are spatially organized according to their role, knowledge about protein localization gives insight into protein function. Here, we describe the LOPIT technique (localization of organelle proteins by isotope tagging) developed for the simultaneous and confident determination of the steady-state distribution of hundreds of integral membrane proteins within organelles. The technique uses a partial membrane fractionation strategy in conjunction with quantitative proteomics. Localization of proteins is achieved by measuring their distribution pattern across the density gradient using amine-reactive isotope tagging and comparing these patterns with those of known organelle residents. LOPIT relies on the assumption that proteins belonging to the same organelle will co-fractionate. Multivariate statistical tools are then used to group proteins according to the similarities in their distributions, and hence localization without complete centrifugal separation is achieved. The protocol requires approximately 3 weeks to complete and can be applied in a high-throughput manner to material from many varied sources.

Assuntos

Proteínas de Membrana/metabolismo , Proteômica/métodos , Marcação por Isótopo/métodos

14.

Mapping the Arabidopsis organelle proteome.

Dunkley, Tom P J; Hester, Svenja; Shadforth, Ian P; Runions, John; Weimar, Thilo; Hanton, Sally L; Griffin, Julian L; Bessant, Conrad; Brandizzi, Federica; Hawes, Chris; Watson, Rod B; Dupree, Paul; Lilley, Kathryn S.

Proc Natl Acad Sci U S A ; 103(17): 6518-23, 2006 Apr 25.

Artigo em Inglês | MEDLINE | ID: mdl-16618929

RESUMO

A challenging task in the study of the secretory pathway is the identification and localization of new proteins to increase our understanding of the functions of different organelles. Previous proteomic studies of the endomembrane system have been hindered by contaminating proteins, making it impossible to assign proteins to organelles. Here we have used the localization of organelle proteins by the isotope tagging technique in conjunction with isotope tags for relative and absolute quantitation and 2D liquid chromatography for the simultaneous assignment of proteins to multiple subcellular compartments. With this approach, the density gradient distributions of 689 proteins from Arabidopsis thaliana were determined, enabling confident and simultaneous localization of 527 proteins to the endoplasmic reticulum, Golgi apparatus, vacuolar membrane, plasma membrane, or mitochondria and plastids. This parallel analysis of endomembrane components has enabled protein steady-state distributions to be determined. Consequently, genuine organelle residents have been distinguished from contaminating proteins and proteins in transit through the secretory pathway.

Assuntos

Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Proteoma/metabolismo , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Organelas/genética , Organelas/metabolismo , Mapeamento de Peptídeos , Proteoma/genética , Proteínas Recombinantes de Fusão/genética , Proteínas Recombinantes de Fusão/metabolismo , Frações Subcelulares/metabolismo

15.

Protein and peptide identification algorithms using MS for use in high-throughput, automated pipelines.

Shadforth, Ian; Crowther, Daniel; Bessant, Conrad.

Proteomics ; 5(16): 4082-95, 2005 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-16196103

RESUMO

Current proteomics experiments can generate vast quantities of data very quickly, but this has not been matched by data analysis capabilities. Although there have been a number of recent reviews covering various aspects of peptide and protein identification methods using MS, comparisons of which methods are either the most appropriate for, or the most effective at, their proposed tasks are not readily available. As the need for high-throughput, automated peptide and protein identification systems increases, the creators of such pipelines need to be able to choose algorithms that are going to perform well both in terms of accuracy and computational efficiency. This article therefore provides a review of the currently available core algorithms for PMF, database searching using MS/MS, sequence tag searches and de novo sequencing. We also assess the relative performances of a number of these algorithms. As there is limited reporting of such information in the literature, we conclude that there is a need for the adoption of a system of standardised reporting on the performance of new peptide and protein identification algorithms, based upon freely available datasets. We go on to present our initial suggestions for the format and content of these datasets.

Assuntos

Algoritmos , Peptídeos/análise , Proteínas/análise , Software , Processamento Alternativo , Bases de Dados de Proteínas , Peptídeos/genética , Polimorfismo Genético , Proteínas/genética , Proteômica , Análise de Sequência , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz

16.

Determination of partial amino acid composition from tandem mass spectra for use in peptide identification strategies.

Shadforth, Ian; Todd, Kieran; Crowther, Daniel; Bessant, Conrad.

Proteomics ; 5(7): 1787-96, 2005 May.

Artigo em Inglês | MEDLINE | ID: mdl-15816003

RESUMO

We demonstrate a new approach to the determination of amino acid composition from tandem mass spectrometrically fragmented peptides using both experimental and simulated data. The approach has been developed to be used as a search-space filter in a protein identification pipeline with the aim of increased performance above that which could be attained by using immonium ion information. Three automated methods have been developed and tested: one based upon a simple peak traversal, in which all intense ion peaks are treated as being either a b- or y-ion using a wide mass tolerance; a second which uses a much narrower tolerance and does not perform transformations of ion peaks to the complementary type; and the unique fragments method which allows for b- or y-ion type to be inferred and corroborated using a scan of the other ions present in each peptide spectrum. The combination of these methods is shown to provide a high-accuracy set of amino acid predictions using both experimental and simulated data sets. These high quality predictions, with an accuracy of over 85%, may be used to identify peptide fragments that are hard to identify using other methods. The data simulation algorithm is also shown post priori to be a good model of noiseless tandem mass spectrometric peptide data.

Assuntos

Aminoácidos/química , Bases de Dados de Proteínas , Peptídeos/química , Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Espectrometria de Massas

17.

Confident protein identification using the average peptide score method coupled with search-specific, ab initio thresholds.

Shadforth, Ian; Dunkley, Tom; Lilley, Kathryn; Crowther, Daniel; Bessant, Conrad.

Rapid Commun Mass Spectrom ; 19(22): 3363-8, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-16235224

RESUMO

Perhaps the greatest difficulty in interpreting large sets of protein identifications derived from mass spectrometric methods is whether or not to trust the results. For such experiments, the level of confidence in each protein identification made needs to be far greater than the often used 95% significance threshold to avoid the identification of many false-positives. To provide higher confidence results, we have developed an innovative scoring strategy coupling the recently published Average Peptide Score (APS) method with pre-filtering of peptide identifications, using a simple peptide quality filter. Iterative generation of these filters in conjunction with reversed database searching is used to determine the correct levels at which the APS and peptide quality thresholds should be set to return virtually zero false-positive reports. This proceeds without the need to reference a known dataset.

Assuntos

Biologia Computacional , Espectrometria de Massas/normas , Proteínas/análise , Proteínas/química , Bases de Dados de Proteínas , Peptídeos/análise , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA