Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 97
Filtrar
1.
Biomolecules ; 14(7)2024 Jul 03.
Artículo en Inglés | MEDLINE | ID: mdl-39062504

RESUMEN

The skin surface is an important sample source that the metabolomics community has only just begun to explore. Alterations in sebum, the lipid-rich mixture coating the skin surface, correlate with age, sex, ethnicity, diet, exercise, and disease state, making the skin surface an ideal sample source for future noninvasive biomarker exploration, disease diagnosis, and forensic investigation. The potential of sebum sampling has been realized primarily via electrospray ionization mass spectrometry (ESI-MS), an ideal approach to assess the skin surface lipidome. However, a better understanding of sebum collection and subsequent ESI-MS analysis is required before skin surface sampling can be implemented in routine analyses. Challenges include ambiguity in definitive lipid identification, inherent biological variability in sebum production, and methodological, technical variability in analyses. To overcome these obstacles, avoid common pitfalls, and achieve reproducible, robust outcomes, every portion of the workflow-from sample collection to data analysis-should be carefully considered with the specific application in mind. This review details current practices in sebum sampling, sample preparation, ESI-MS data acquisition, and data analysis, and it provides important considerations in acquiring meaningful lipidomic datasets from the skin surface. Forensic researchers investigating sebum as a means for suspect elimination in lieu of adequate fingerprint ridge detail or database matches, as well as clinical researchers interested in noninvasive biomarker exploration, disease diagnosis, and treatment monitoring, can use this review as a guide for developing methods of best-practice.


Asunto(s)
Sebo , Piel , Espectrometría de Masa por Ionización de Electrospray , Sebo/metabolismo , Sebo/química , Humanos , Espectrometría de Masa por Ionización de Electrospray/métodos , Piel/metabolismo , Piel/química , Lípidos/análisis , Lípidos/química , Lipidómica/métodos
2.
J Proteome Res ; 2024 Jan 03.
Artículo en Inglés | MEDLINE | ID: mdl-38171506

RESUMEN

Triacylglycerols and wax esters are two lipid classes that have been linked to diseases, including autism, Alzheimer's disease, dementia, cardiovascular disease, dry eye disease, and diabetes, and thus are molecules worthy of biomarker exploration studies. Since triacylglycerols and wax esters make up the majority of skin-surface lipid secretions, a viable sampling method for these potential biomarkers would be that of groomed latent fingerprints. Currently, however, blood-based sampling protocols predominate in the field. The invasiveness of a blood draw limits its utility to protected populations, including children and the elderly. Herein we describe a noninvasive means for sample collection (from fingerprints) paired with fast MS data-acquisition (MassIVE data set MSV000092742) and efficient data analysis via machine learning. Using both supervised and unsupervised classification, we demonstrate the usefulness of this method in determining whether a variable of interest imparts measurable change within the lipidomic data set. As a proof-of-concept, we show that the method is capable of distinguishing between the fingerprints of different individuals as well as between anatomical sebum collection regions. This noninvasive, high-throughput approach enables future lipidomic biomarker researchers to more easily include underrepresented, protected populations, such as children and the elderly, thus moving the field closer to definitive disease diagnoses that apply to all.

3.
Cell Rep Phys Sci ; 4(11)2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-38078148

RESUMEN

Large language models like ChatGPT can generate authentic-seeming text at lightning speed, but many journal publishers reject language models as authors on manuscripts. Thus, a means to accurately distinguish human-generated from artificial intelligence (AI)-generated text is immediately needed. We recently developed an accurate AI text detector for scientific journals and, herein, test its ability in a variety of challenging situations, including on human text from a wide variety of chemistry journals, on AI text from the most advanced publicly available language model (GPT-4), and, most important, on AI text generated using prompts designed to obfuscate AI use. In all cases, AI and human text was assigned with high accuracy. ChatGPT-generated text can be readily detected in chemistry journals; this advance is a fundamental prerequisite for understanding how automated text generation will impact scientific publishing from now into the future.

4.
J Am Soc Mass Spectrom ; 34(12): 2775-2784, 2023 Dec 06.
Artículo en Inglés | MEDLINE | ID: mdl-37897440

RESUMEN

To achieve high quality omics results, systematic variability in mass spectrometry (MS) data must be adequately addressed. Effective data normalization is essential for minimizing this variability. The abundance of approaches and the data-dependent nature of normalization have led some researchers to develop open-source academic software for choosing the best approach. While these tools are certainly beneficial to the community, none of them meet all of the needs of all users, particularly users who want to test new strategies that are not available in these products. Herein, we present a simple and straightforward workflow that facilitates the identification of optimal normalization strategies using straightforward evaluation metrics, employing both supervised and unsupervised machine learning. The workflow offers a "DIY" aspect, where the performance of any normalization strategy can be evaluated for any type of MS data. As a demonstration of its utility, we apply this workflow on two distinct datasets, an ESI-MS dataset of extracted lipids from latent fingerprints and a cancer spheroid dataset of metabolites ionized by MALDI-MSI, for which we identified the best-performing normalization strategies.


Asunto(s)
Neoplasias , Aprendizaje Automático no Supervisado , Humanos , Flujo de Trabajo , Programas Informáticos , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción
5.
Cell Rep Phys Sci ; 4(6)2023 Jun 21.
Artículo en Inglés | MEDLINE | ID: mdl-37426542

RESUMEN

ChatGPT has enabled access to artificial intelligence (AI)-generated writing for the masses, initiating a culture shift in the way people work, learn, and write. The need to discriminate human writing from AI is now both critical and urgent. Addressing this need, we report a method for discriminating text generated by ChatGPT from (human) academic scientists, relying on prevalent and accessible supervised classification methods. The approach uses new features for discriminating (these) humans from AI; as examples, scientists write long paragraphs and have a penchant for equivocal language, frequently using words like "but," "however," and "although." With a set of 20 features, we built a model that assigns the author, as human or AI, at over 99% accuracy. This strategy could be further adapted and developed by others with basic skills in supervised classification, enabling access to many highly accurate and targeted models for detecting AI usage in academic writing and beyond.

6.
Commun Biol ; 6(1): 535, 2023 05 18.
Artículo en Inglés | MEDLINE | ID: mdl-37202420

RESUMEN

During virus entry, the pretriggered human immunodeficiency virus (HIV-1) envelope glycoprotein (Env) trimer initially transits into a default intermediate state (DIS) that remains structurally uncharacterized. Here, we present cryo-EM structures at near-atomic resolution of two cleaved full-length HIV-1 Env trimers purified from cell membranes in styrene-maleic acid lipid nanoparticles without antibodies or receptors. The cleaved Env trimers exhibited tighter subunit packing than uncleaved trimers. Cleaved and uncleaved Env trimers assumed remarkably consistent yet distinct asymmetric conformations, with one smaller and two larger opening angles. Breaking conformational symmetry is allosterically coupled with dynamic helical transformations of the gp41 N-terminal heptad repeat (HR1N) regions in two protomers and with trimer tilting in the membrane. The broken symmetry of the DIS potentially assists Env binding to two CD4 receptors-while resisting antibody binding-and promotes extension of the gp41 HR1 helical coiled-coil, which relocates the fusion peptide closer to the target cell membrane.


Asunto(s)
Proteína gp41 de Envoltorio del VIH , VIH-1 , Humanos , Proteína gp41 de Envoltorio del VIH/química , Proteína gp41 de Envoltorio del VIH/metabolismo , VIH-1/química , Conformación Proteica , Glicoproteínas , Estirenos
7.
Cell Rep Phys Sci ; 3(10)2022 Oct 19.
Artículo en Inglés | MEDLINE | ID: mdl-36381226

RESUMEN

The fields of proteomics and machine learning are both large disciplines, each producing well over 5,000 publications per year. However, studies combining both fields are still relatively rare, with only about 2% of recent proteomics papers including machine learning. This review, which focuses on the intersection of the fields, is intended to inspire proteomics researchers to develop skills and knowledge in the application of machine learning. A brief tutorial introduction to machine learning is provided, and research advances that rely on both fields, particularly as they relate to proteomics tools development and biomarker discovery, are highlighted. Key knowledge gaps and opportunities for scientific advancement are also enumerated.

8.
J Proteome Res ; 21(9): 2071-2074, 2022 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-36004690

RESUMEN

This review "teaches" researchers how to make their lackluster proteomics data look really impressive, by applying an inappropriate but pervasive strategy that selects features in a biased manner. The strategy is demonstrated and used to build a classification model with an accuracy of 92% and AUC of 0.98, while relying completely on random numbers for the data set. This "lesson" in data processing is not to be practiced by anyone; on the contrary, it is meant to be a cautionary tale showing that very unreliable results are obtained when a biomarker panel is generated first, using all the available data, and then tested by cross-validation. Data scientists describe the error committed in this scenario as having test data leak into the feature selection step, and it is currently a common mistake in proteomics biomarker studies that rely on machine learning. After the demonstration, advice is provided about how machine learning methods can be applied to proteomics data sets without generating artificially inflated accuracies.


Asunto(s)
Aprendizaje Automático , Proteómica , Biomarcadores , Proteómica/métodos
9.
Int J Mol Sci ; 23(11)2022 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-35682561

RESUMEN

Lysyl oxidase-like 2 (LOXL2) catalyzes the oxidative deamination of peptidyl lysines and hydroxylysines to promote extracellular matrix remodeling. Aberrant activity of LOXL2 has been associated with organ fibrosis and tumor metastasis. The lysine tyrosylquinone (LTQ) cofactor is derived from Lys653 and Tyr689 in the amine oxidase domain via post-translational modification. Based on the similarity in hydrodynamic radius and radius of gyration, we recently proposed that the overall structures of the mature LOXL2 (containing LTQ) and the precursor LOXL2 (no LTQ) are very similar. In this study, we conducted a mass spectrometry-based disulfide mapping analysis of recombinant LOXL2 in three forms: a full-length LOXL2 (fl-LOXL2) containing a nearly stoichiometric amount of LTQ, Δ1-2SRCR-LOXL2 (SRCR1 and SRCR2 are truncated) in the precursor form, and Δ1-3SRCR-LOXL2 (SRCR1, SRCR2, SRCR3 are truncated) in a mixture of the precursor and the mature forms. We detected a set of five disulfide bonds that is conserved in both the precursor and the mature recombinant LOXL2s. In addition, we detected a set of four alternative disulfide bonds in low abundance that is not associated with the mature LOXL2. These results suggest that the major set of five disulfide bonds is retained post-LTQ formation.


Asunto(s)
Disulfuros , Proteína-Lisina 6-Oxidasa , Aminoácido Oxidorreductasas/metabolismo , Matriz Extracelular/metabolismo , Espectrometría de Masas , Procesamiento Proteico-Postraduccional , Proteína-Lisina 6-Oxidasa/metabolismo
10.
J Proteome Res ; 21(4): 1095-1104, 2022 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-35276041

RESUMEN

Recent studies have highlighted that the proteome can be used to identify potential biomarker candidates for Alzheimer's disease (AD) in diverse cohorts. Furthermore, the racial and ethnic background of participants is an important factor to consider to ensure the effectiveness of potential biomarkers for representative populations. A promising approach to survey potential biomarker candidates for diagnosing AD in diverse cohorts is the application of machine learning to proteomics data sets. Herein, we leveraged six existing bottom-up proteomics data sets, which included non-Hispanic White, African American/Black, and Hispanic participants, to study protein changes in AD and cognitively unimpaired participants. Machine learning models were applied to these data sets and resulted in the identification of amyloid-ß precursor protein (APP) and heat shock protein ß-1 (HSPB1) as two proteins that have high ability to distinguish AD; however, each protein's performance varied based upon the racial and ethnic background of the participants. HSPB1 particularly was helpful for generating high areas under the curve (AUCs) for African American/Black participants. Overall, HSPB1 improved the performance of the machine learning models when combined with APP and/or participant age and is a potential candidate that should be further explored in AD biomarker discovery efforts.


Asunto(s)
Enfermedad de Alzheimer , Enfermedad de Alzheimer/diagnóstico , Enfermedad de Alzheimer/metabolismo , Biomarcadores , Encéfalo/metabolismo , Humanos , Aprendizaje Automático , Proteómica/métodos , Grupos Raciales
11.
Mass Spectrom Rev ; 41(6): 901-921, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-33565652

RESUMEN

Glycans introduce complexity to the proteins to which they are attached. These modifications vary during the progression of many diseases; thus, they serve as potential biomarkers for disease diagnosis and prognosis. The immense structural diversity of glycans makes glycosylation analysis and quantitation difficult. Fortunately, recent advances in analytical techniques provide the opportunity to quantify even low-abundant glycopeptides and glycans derived from complex biological mixtures, allowing for the identification of glycosylation differences between healthy samples and those derived from disease states. Understanding the strengths and weaknesses of different quantitative glycomics analysis methods is important for selecting the best strategy to analyze glycosylation changes in any given set of clinical samples. To provide guidance towards selecting the proper approach, we discuss four widely used quantitative glycomics analysis platforms, including fluorescence-based analysis of released N-linked glycans and three different varieties of MS-based analysis: liquid chromatography (LC)-mass spectrometry (MS) analysis of glycopeptides, matrix-assisted laser desorption ionization-time of flight MS, and LC-ESI-MS analysis of released N-linked glycans. These methods' strengths and weaknesses are compared, particularly associated with the figures of merit that are important for clinical biomarker studies, including: the initial sample requirements, the methods' throughput, sample preparation time, the number of species identified, the methods' utility for isomer separation and structural characterization, method-related challenges associated with quantitation, repeatability, the expertise required, and the cost for each analysis. This review, therefore, provides unique guidance to researchers who endeavor to undertake a clinical glycomics analysis by offering insights on the available analysis technologies.


Asunto(s)
Glicómica , Polisacáridos , Cromatografía Liquida/métodos , Glicómica/métodos , Glicopéptidos , Espectrometría de Masas , Polisacáridos/análisis , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción
12.
J Virol ; 96(3): e0162621, 2022 02 09.
Artículo en Inglés | MEDLINE | ID: mdl-34817202

RESUMEN

The SARS-CoV-2 coronavirus, the etiologic agent of COVID-19, uses its spike (S) glycoprotein anchored in the viral membrane to enter host cells. The S glycoprotein is the major target for neutralizing antibodies elicited by natural infection and by vaccines. Approximately 35% of the SARS-CoV-2 S glycoprotein consists of carbohydrate, which can influence virus infectivity and susceptibility to antibody inhibition. We found that virus-like particles produced by coexpression of SARS-CoV-2 S, M, E, and N proteins contained spike glycoproteins that were extensively modified by complex carbohydrates. We used a fucose-selective lectin to purify the Golgi-modified fraction of a wild-type SARS-CoV-2 S glycoprotein trimer and determined its glycosylation and disulfide bond profile. Compared with soluble or solubilized S glycoproteins modified to prevent proteolytic cleavage and to retain a prefusion conformation, more of the wild-type S glycoprotein N-linked glycans are processed to complex forms. Even Asn 234, a significant percentage of which is decorated by high-mannose glycans on other characterized S trimer preparations, is predominantly modified in the Golgi compartment by processed glycans. Three incompletely occupied sites of O-linked glycosylation were detected. Viruses pseudotyped with natural variants of the serine/threonine residues implicated in O-linked glycosylation were generally infectious and exhibited sensitivity to neutralization by soluble ACE2 and convalescent antisera comparable to that of the wild-type virus. Unlike other natural cysteine variants, a Cys15Phe (C15F) mutant retained partial, but unstable, infectivity. These findings enhance our understanding of the Golgi processing of the native SARS-CoV-2 S glycoprotein carbohydrates and could assist the design of interventions. IMPORTANCE The SARS-CoV-2 coronavirus, which causes COVID-19, uses its spike glycoprotein to enter host cells. The viral spike glycoprotein is the main target of host neutralizing antibodies that help to control SARS-CoV-2 infection and are important for the protection provided by vaccines. The SARS-CoV-2 spike glycoprotein consists of a trimer of two subunits covered with a coat of carbohydrates (sugars). Here, we describe the disulfide bonds that assist the SARS-CoV-2 spike glycoprotein to assume the correct shape and the composition of the sugar moieties on the glycoprotein surface. We also evaluate the consequences of natural virus variation in O-linked sugar addition and in the cysteine residues involved in disulfide bond formation. This information can expedite the improvement of vaccines and therapies for COVID-19.


Asunto(s)
COVID-19/virología , SARS-CoV-2/fisiología , Glicoproteína de la Espiga del Coronavirus/metabolismo , Secuencia de Aminoácidos , Anticuerpos Neutralizantes/inmunología , Disulfuros , Regulación Viral de la Expresión Génica , Glicosilación , Humanos , Modelos Moleculares , Pruebas de Neutralización , Conformación Proteica , Procesamiento Proteico-Postraduccional , Transporte de Proteínas , Proteínas Recombinantes , Glicoproteína de la Espiga del Coronavirus/química , Glicoproteína de la Espiga del Coronavirus/genética , Glicoproteína de la Espiga del Coronavirus/aislamiento & purificación , Relación Estructura-Actividad
13.
J Virol ; 95(24): e0052921, 2021 11 23.
Artículo en Inglés | MEDLINE | ID: mdl-34549974

RESUMEN

The functional human immunodeficiency virus (HIV-1) envelope glycoprotein (Env) trimer [(gp120/gp41)3] is produced by cleavage of a conformationally flexible gp160 precursor. gp160 cleavage or the binding of BMS-806, an entry inhibitor, stabilizes the pretriggered, "closed" (state 1) conformation recognized by rarely elicited broadly neutralizing antibodies. Poorly neutralizing antibodies (pNAbs) elicited at high titers during natural infection recognize more "open" Env conformations (states 2 and 3) induced by binding the receptor, CD4. We found that BMS-806 treatment and cross-linking decreased the exposure of pNAb epitopes on cell surface gp160; however, after detergent solubilization, cross-linked and BMS-806-treated gp160 sampled non-state-1 conformations that could be recognized by pNAbs. Cryo-electron microscopy of the purified BMS-806-bound gp160 revealed two hitherto unknown asymmetric trimer conformations, providing insights into the allosteric coupling between trimer opening and structural variation in the gp41 HR1N region. The individual protomer structures in the asymmetric gp160 trimers resemble those of other genetically modified or antibody-bound cleaved HIV-1 Env trimers, which have been suggested to assume state-2-like conformations. Asymmetry of the uncleaved Env potentially exposes surfaces of the trimer to pNAbs. To evaluate the effect of stabilizing a state-1-like conformation of the membrane Env precursor, we treated cells expressing wild-type HIV-1 Env with BMS-806. BMS-806 treatment decreased both gp160 cleavage and the addition of complex glycans, implying that gp160 conformational flexibility contributes to the efficiency of these processes. Selective pressure to maintain flexibility in the precursor of functional Env allows the uncleaved Env to sample asymmetric conformations that potentially skew host antibody responses toward pNAbs. IMPORTANCE The envelope glycoprotein (Env) trimers on the surface of human immunodeficiency virus (HIV-1) mediate the entry of the virus into host cells and serve as targets for neutralizing antibodies. The functional Env trimer is produced by cleavage of the gp160 precursor in the infected cell. We found that the HIV-1 Env precursor is highly plastic, allowing it to assume different asymmetric shapes. This conformational plasticity is potentially important for Env cleavage and proper modification by sugars. Having a flexible, asymmetric Env precursor that can misdirect host antibody responses without compromising virus infectivity would be an advantage for a persistent virus like HIV-1.


Asunto(s)
Proteína gp120 de Envoltorio del VIH/inmunología , Proteína gp41 de Envoltorio del VIH/química , VIH-1/química , Animales , Anticuerpos Neutralizantes/inmunología , Células CHO , Cricetulus , Microscopía por Crioelectrón/métodos , Infecciones por VIH/virología , VIH-1/inmunología , Humanos , Unión Proteica , Conformación Proteica , Multimerización de Proteína , Productos del Gen env del Virus de la Inmunodeficiencia Humana/inmunología
14.
Anal Bioanal Chem ; 413(29): 7215-7227, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34448030

RESUMEN

Glycosylation analysis of viral glycoproteins contributes significantly to vaccine design and development. Among other benefits, glycosylation analysis allows vaccine developers to assess the impact of construct design or producer cell line choices for vaccine production, and it is a key measure by which glycoproteins that are produced for use in vaccination can be compared to their native viral forms. Because many viral glycoproteins are multiply glycosylated, glycopeptide analysis is a preferrable approach for mapping the glycans, yet the analysis of glycopeptide data can be cumbersome and requires the expertise of an experienced analyst. In recent years, a commercial software product, Byonic, has been implemented in several instances to facilitate glycopeptide analysis on viral glycoproteins and other glycoproteomics data sets, and the purpose of the study herein is to determine the strengths and limitations of using this software, particularly in cases relevant to vaccine development. The glycopeptides from a recombinantly expressed trimeric S glycoprotein of the SARS-CoV-2 virus were first analyzed using an expert-based analysis strategy; subsequently, analysis of the same data set was completed using Byonic. Careful assessment of instances where the two methods produced different results revealed that the glycopeptide assignments from Byonic contained more false positives than true positives, even when the data were assessed using a 1% false discovery rate. The work herein provides a roadmap for removing the spurious assignments that Byonic generates, and it provides an assessment of the opportunity cost for relying on automated assignments for glycopeptide data sets from viral glycoproteins.


Asunto(s)
Glicopéptidos/metabolismo , Glicoproteína de la Espiga del Coronavirus/metabolismo , Algoritmos , Secuencia de Aminoácidos , Cromatografía Liquida/métodos , Glicoproteína de la Espiga del Coronavirus/química , Espectrometría de Masas en Tándem/métodos
15.
bioRxiv ; 2021 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-33821278

RESUMEN

The SARS-CoV-2 coronavirus, the etiologic agent of COVID-19, uses its spike (S) glycoprotein anchored in the viral membrane to enter host cells. The S glycoprotein is the major target for neutralizing antibodies elicited by natural infection and by vaccines. Approximately 35% of the SARS-CoV-2 S glycoprotein consists of carbohydrate, which can influence virus infectivity and susceptibility to antibody inhibition. We found that virus-like particles produced by coexpression of SARS-CoV-2 S, M, E and N proteins contained spike glycoproteins that were extensively modified by complex carbohydrates. We used a fucose-selective lectin to enrich the Golgi-resident fraction of a wild-type SARS-CoV-2 S glycoprotein trimer, and determined its glycosylation and disulfide bond profile. Compared with soluble or solubilized S glycoproteins modified to prevent proteolytic cleavage and to retain a prefusion conformation, more of the wild-type S glycoprotein N-linked glycans are processed to complex forms. Even Asn 234, a significant percentage of which is decorated by high-mannose glycans on soluble and virion S trimers, is predominantly modified in the Golgi by processed glycans. Three incompletely occupied sites of O-linked glycosylation were detected. Viruses pseudotyped with natural variants of the serine/threonine residues implicated in O-linked glycosylation were generally infectious and exhibited sensitivity to neutralization by soluble ACE2 and convalescent antisera comparable to that of the wild-type virus. Unlike other natural cysteine variants, a Cys15Phe (C15F) mutant retained partial, but unstable, infectivity. These findings enhance our understanding of the Golgi processing of the native SARS-CoV-2 S glycoprotein carbohydrates and could assist the design of interventions.

16.
J Proteome Res ; 20(5): 2823-2829, 2021 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-33909976

RESUMEN

Mass spectrometry data sets from omics studies are an optimal information source for discriminating patients with disease and identifying biomarkers. Thousands of proteins or endogenous metabolites can be queried in each analysis, spanning several orders of magnitude in abundance. Machine learning tools that effectively leverage these data to accurately identify disease states are in high demand. While mass spectrometry data sets are rich with potentially useful information, using the data effectively can be challenging because of missing entries in the data sets and because the number of samples is typically much smaller than the number of features, two challenges that make machine learning difficult. To address this problem, we have modified a new supervised classification tool, the Aristotle Classifier, so that omics data sets can be better leveraged for identifying disease states. The optimized classifier, AC.2021, is benchmarked on multiple data sets against its predecessor and two leading supervised classification tools, Support Vector Machine (SVM) and XGBoost. The new classifier, AC.2021, outperformed existing tools on multiple tests using proteomics data. The underlying code for the classifier, provided herein, would be useful for researchers who desire improved classification accuracy when using their omics data sets to identify disease states.


Asunto(s)
Proteómica , Máquina de Vectores de Soporte , Algoritmos , Biomarcadores , Humanos , Aprendizaje Automático
17.
Data Brief ; 35: 106923, 2021 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-33786345

RESUMEN

Here we present a plasma proteomics dataset that was generated to understand the importance of self-reported race for biomarker discovery in Alzheimer's disease. This dataset is related to the article "Why inclusion matters for Alzheimer's disease biomarker discovery in plasma" [1]. Plasma samples were obtained from clinically diagnosed Alzheimer's disease and cognitively normal adults of African American/Black and non-Hispanic White racial and ethnic backgrounds. Plasma was immunodepleted, digested, and isobarically tagged with commercial reagents. Tagged peptides were fractionated using high pH fractionation and resulting fractions analysed by liquid chromatography - mass spectrometry (LC-MS/MS & MS3) analysis on an Orbitrap Fusion Lumos mass spectrometer. The resulting data was processed using Proteome Discoverer to produce a list of identified proteins with corresponding tandem mass tag (TMT) intensity information.

18.
Protein Expr Purif ; 181: 105837, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33529763

RESUMEN

Due to the important pathological roles of the HIV-1 gp120, the protein has been intensively used in the research of HIV. However, recombinant gp120 preparation has proven to be difficult because of extremely low expression levels. In order to facilitate gp120 expression, previous methods predominantly involved the replacement of native signal peptide with a heterologous one, resulting in very limited improvement. Currently, preparation of recombinant gp120 with native glycans relies solely on transient expression systems, which are not amendable for large scale production. In this work, we employed a different approach for gp120 expression. Besides replacing the native gp120 signal peptide with that of rat serum albumin and optimizing its codon usage, we generated a stable gp120-expressing cell line in a glutamine synthetase knockout HEK293T cell line that we established for the purpose of amplification of recombinant gene expressions. The combined usage of these techniques dramatically increased gp120 expression levels and yielded a functional product with human cell derived glycan. This method may be applicable to large scale preparation of other viral envelope proteins, such as that of the emerging SARS-CoV-2, or other glycoproteins which require the presence of authentic human glycans.


Asunto(s)
Glutamato-Amoníaco Ligasa/genética , Proteína gp120 de Envoltorio del VIH/metabolismo , VIH-1/metabolismo , Animales , Células CHO , Sistemas CRISPR-Cas , Codón , Cricetulus , Técnicas de Silenciamiento del Gen , Células HEK293 , Humanos , Señales de Clasificación de Proteína , Proteínas Recombinantes/metabolismo
19.
Anal Bioanal Chem ; 413(6): 1583-1593, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33580828

RESUMEN

One unifying challenge when classifying biological samples with mass spectrometry data is overcoming the obstacle of sample-to-sample variability so that differences between groups, such as between a healthy set and a disease set, can be identified. Similarly, when the same sample is re-analyzed under identical conditions, instrument signals can fluctuate by more than 10%. This signal inconsistency imposes difficulties in identifying subtle differences across a set of samples, and it weakens the mass spectrometrist's ability to effectively leverage data in domains as diverse as proteomics, metabolomics, glycomics, and imaging. We selected challenging data sets in the fields of glycomics, mass spectrometry imaging, and bacterial typing to study the problem of within-group signal variability and adapted a 30-year-old statistical approach to address the problem. The solution, "local-balanced model," relies on using balanced subsets of training data to classify test samples. This analysis strategy was assessed on ESI-MS data of IgG-based glycopeptides and MALDI-MS imaging data of endogenous lipids, and MALDI-MS data of bacterial proteins. Two preliminary examples on non-mass spectrometry data sets are also included to show the potential generality of the method outside the field of MS analysis. We demonstrate that this approach is superior to simple normalization methods, generalizable to multiple mass spectrometry domains, and potentially appropriate in fields as diverse as physics and satellite imaging. In some cases, improvements in classification can be dramatic, with accuracy escalating from 60% with normalization alone to over 90% with the additional development described herein.

20.
J Alzheimers Dis ; 79(3): 1327-1344, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33427747

RESUMEN

BACKGROUND: African American/Black adults have a disproportionate incidence of Alzheimer's disease (AD) and are underrepresented in biomarker discovery efforts. OBJECTIVE: This study aimed to identify potential diagnostic biomarkers for AD using a combination of proteomics and machine learning approaches in a cohort that included African American/Black adults. METHODS: We conducted a discovery-based plasma proteomics study on plasma samples (N = 113) obtained from clinically diagnosed AD and cognitively normal adults that were self-reported African American/Black or non-Hispanic White. Sets of differentially-expressed proteins were then classified using a support vector machine (SVM) to identify biomarker candidates. RESULTS: In total, 740 proteins were identified of which, 25 differentially-expressed proteins in AD came from comparisons within a single racial and ethnic background group. Six proteins were differentially-expressed in AD regardless of racial and ethnic background. Supervised classification by SVM yielded an area under the curve (AUC) of 0.91 and accuracy of 86%for differentiating AD in samples from non-Hispanic White adults when trained with differentially-expressed proteins unique to that group. However, the same model yielded an AUC of 0.49 and accuracy of 47%for differentiating AD in samples from African American/Black adults. Other covariates such as age, APOE4 status, sex, and years of education were found to improve the model mostly in the samples from non-Hispanic White adults for classifying AD. CONCLUSION: These results demonstrate the importance of study designs in AD biomarker discovery, which must include diverse racial and ethnic groups such as African American/Black adults to develop effective biomarkers.


Asunto(s)
Enfermedad de Alzheimer/sangre , Negro o Afroamericano/estadística & datos numéricos , Anciano , Enfermedad de Alzheimer/diagnóstico , Biomarcadores/sangre , Estudios de Casos y Controles , Femenino , Humanos , Aprendizaje Automático , Masculino , Selección de Paciente , Proteómica , Máquina de Vectores de Soporte , Población Blanca/estadística & datos numéricos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA