Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
1.
JAMIA Open ; 7(2): ooae037, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38911332

ABSTRACT

Objectives: Anaphylaxis is a severe life-threatening allergic reaction, and its accurate identification in healthcare databases can harness the potential of "Big Data" for healthcare or public health purposes. Materials and methods: This study used claims data obtained between October 1, 2015 and February 28, 2019 from the CMS database to examine the utility of machine learning in identifying incident anaphylaxis cases. We created a feature selection pipeline to identify critical features between different datasets. Then a variety of unsupervised and supervised methods were used (eg, Sammon mapping and eXtreme Gradient Boosting) to train models on datasets of differing data quality, which reflects the varying availability and potential rarity of ground truth data in medical databases. Results: Resulting machine learning model accuracies ranged from 47.7% to 94.4% when tested on ground truth data. Finally, we found new features to help experts enhance existing case-finding algorithms. Discussion: Developing precise algorithms to detect medical outcomes in claims can be a laborious and expensive process, particularly for conditions presented and coded diversely. We found it beneficial to filter out highly potent codes used for data curation to identify underlying patterns and features. To improve rule-based algorithms where necessary, researchers could use model explainers to determine noteworthy features, which could then be shared with experts and included in the algorithm. Conclusion: Our work suggests machine learning models can perform at similar levels as a previously published expert case-finding algorithm, while also having the potential to improve performance or streamline algorithm construction processes by identifying new relevant features for algorithm construction.

2.
Clin Pharmacol Ther ; 115(4): 745-757, 2024 04.
Article in English | MEDLINE | ID: mdl-37965805

ABSTRACT

In 2020, Novartis Pharmaceuticals Corporation and the U.S. Food and Drug Administration (FDA) started a 4-year scientific collaboration to approach complex new data modalities and advanced analytics. The scientific question was to find novel radio-genomics-based prognostic and predictive factors for HR+/HER- metastatic breast cancer under a Research Collaboration Agreement. This collaboration has been providing valuable insights to help successfully implement future scientific projects, particularly using artificial intelligence and machine learning. This tutorial aims to provide tangible guidelines for a multi-omics project that includes multidisciplinary expert teams, spanning across different institutions. We cover key ideas, such as "maintaining effective communication" and "following good data science practices," followed by the four steps of exploratory projects, namely (1) plan, (2) design, (3) develop, and (4) disseminate. We break each step into smaller concepts with strategies for implementation and provide illustrations from our collaboration to further give the readers actionable guidance.


Subject(s)
Artificial Intelligence , Multiomics , Humans , Machine Learning , Genomics
3.
PLoS One ; 18(12): e0293406, 2023.
Article in English | MEDLINE | ID: mdl-38060571

ABSTRACT

The AGMK1-9T7 cell line has been used to study neoplasia in tissue culture. By passage in cell culture, these cells evolved to become tumorigenic and metastatic in immunodeficient mice at passage 40. Of the 20 x 106 kidney cells originally plated, less than 2% formed the colonies that evolved to create this cell line. These cells could be the progeny of some type of kidney progenitor cells. To characterize these cells, we documented their renal lineage by their expression of PAX-2 and MIOX, detected by indirect immunofluorescence. These cells assessed by flow-cytometry expressed high levels of CD44, CD73, CD105, Sca-1, and GLI1 across all passages tested; these markers have been reported to be expressed by renal progenitor cells. The expression of GLI1 was confirmed by immunofluorescence and western blot analysis. Cells from passages 13 to 23 possessed the ability to differentiate into adipocytes, osteoblasts, and chondrocytes; after passage 23, their ability to form these cell types was lost. These data indicate that the cells that formed the AGMK1-9T7 cell line were GLI1+ perivascular, kidney, progenitor cells.


Subject(s)
Mesenchymal Stem Cells , Neoplasms , Animals , Mice , Zinc Finger Protein GLI1/metabolism , Cell Differentiation , Cell Line , Stem Cells , Neoplasms/metabolism , Kidney , Cells, Cultured
4.
JAMIA Open ; 6(4): ooad090, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37900974

ABSTRACT

Objective: Anaphylaxis is a severe life-threatening allergic reaction, and its accurate identification in healthcare databases can harness the potential of "Big Data" for healthcare or public health purposes. Methods: This study used claims data obtained between October 1, 2015 and February 28, 2019 from the CMS database to examine the utility of machine learning in identifying incident anaphylaxis cases. We created a feature selection pipeline to identify critical features between different datasets. Then a variety of unsupervised and supervised methods were used (eg, Sammon mapping and eXtreme Gradient Boosting) to train models on datasets of differing data quality, which reflects the varying availability and potential rarity of ground truth data in medical databases. Results: Resulting machine learning model accuracies ranged between 47.7% and 94.4% when tested on ground truth data. Finally, we found new features to help experts enhance existing case-finding algorithms. Discussion: Developing precise algorithms to detect medical outcomes in claims can be a laborious and expensive process, particularly for conditions presented and coded diversely. We found it beneficial to filter out highly potent codes used for data curation to identify underlying patterns and features. To improve rule-based algorithms where necessary, researchers could use model explainers to determine noteworthy features, which could then be shared with experts and included in the algorithm. Conclusion: Our work suggests machine learning models can perform at similar levels as a previously published expert case-finding algorithm, while also having the potential to improve performance or streamline algorithm construction processes by identifying new relevant features for algorithm construction.

5.
Front Immunol ; 13: 797918, 2022.
Article in English | MEDLINE | ID: mdl-35493476

ABSTRACT

Vaccines against the severe acute respiratory syndrome coronavirus 2, which have been in urgent need and development since the beginning of 2020, are aimed to induce a prominent immune system response capable of recognizing and fighting future infection. Here we analyzed the levels of IgG antibodies against the receptor-binding domain (RBD) of the viral spike protein after the administration of three types of popular vaccines, BNT162b2, mRNA-1273, or Sputnik V, using the same ELISA assay to compare their effects. An efficient immune response was observed in the majority of cases. The obtained ranges of signal values were wide, presumably reflecting specific features of the immune system of individuals. At the same time, these ranges were comparable among the three studied vaccines. The anti-RBD IgG levels after vaccination were also similar to those in the patients with moderate/severe course of the COVID-19, and significantly higher than in the individuals with asymptomatic or light symptomatic courses of the disease. No significant correlation was observed between the levels of anti-RBD IgG and sex or age of the vaccinated individuals. The signals measured at different time points for several individuals after full Sputnik V vaccination did not have a significant tendency to lower within many weeks. The rate of neutralization of the interaction of the RBD with the ACE2 receptor after vaccination with Sputnik V was on average slightly higher than in patients with a moderate/severe course of COVID-19. The importance of the second dose administration of the two-dose Sputnik V vaccine was confirmed: while several individuals had not developed detectable levels of the anti-RBD IgG antibodies after the first dose of Sputnik V, after the second dose the antibody signal became positive for all tested individuals and raised on average 5.4 fold. Finally, we showed that people previously infected with SARS-CoV-2 developed high levels of antibodies, efficiently neutralizing interaction of RBD with ACE2 after the first dose of Sputnik V, with almost no change after the second dose.


Subject(s)
COVID-19 , Viral Vaccines , 2019-nCoV Vaccine mRNA-1273 , Angiotensin-Converting Enzyme 2 , Antibodies, Viral , BNT162 Vaccine , COVID-19/prevention & control , COVID-19 Vaccines , Humans , Immunity , Immunoglobulin G , SARS-CoV-2 , Vaccines, Synthetic
6.
PLoS One ; 17(1): e0262134, 2022.
Article in English | MEDLINE | ID: mdl-34990474

ABSTRACT

Autophagy drives drug resistance and drug-induced cancer cell cytotoxicity. Targeting the autophagy process could greatly improve chemotherapy outcomes. The discovery of specific inhibitors or activators has been hindered by challenges with reliably measuring autophagy levels in a clinical setting. We investigated drug-induced autophagy in breast cancer cell lines with differing ER/PR/Her2 receptor status by exposing them to known but divergent autophagy inducers each with a unique molecular target, tamoxifen, trastuzumab, bortezomib or rapamycin. Differential gene expression analysis from total RNA extracted during the earliest sign of autophagy flux showed both cell- and drug-specific changes. We analyzed the list of differentially expressed genes to find a common, cell- and drug-agnostic autophagy signature. Twelve mRNAs were significantly modulated by all the drugs and 11 were orthogonally verified with Q-RT-PCR (Klhl24, Hbp1, Crebrf, Ypel2, Fbxo32, Gdf15, Cdc25a, Ddit4, Psat1, Cd22, Ypel3). The drug agnostic mRNA signature was similarly induced by a mitochondrially targeted agent, MitoQ. In-silico analysis on the KM-plotter cancer database showed that the levels of these mRNAs are detectable in human samples and associated with breast cancer prognosis outcomes of Relapse-Free Survival in all patients (RSF), Overall Survival in all patients (OS), and Relapse-Free Survival in ER+ Patients (RSF ER+). High levels of Klhl24, Hbp1, Crebrf, Ypel2, CD22 and Ypel3 were correlated with better outcomes, whereas lower levels of Gdf15, Cdc25a, Ddit4 and Psat1 were associated with better prognosis in breast cancer patients. This gene signature uncovers candidate autophagy biomarkers that could be tested during preclinical and clinical studies to monitor the autophagy process.


Subject(s)
Antineoplastic Agents/pharmacology , Biomarkers, Tumor/genetics , Breast Neoplasms/genetics , Gene Expression Profiling/methods , Gene Regulatory Networks , Antineoplastic Agents/therapeutic use , Autophagy/drug effects , Bortezomib/pharmacology , Bortezomib/therapeutic use , Breast Neoplasms/drug therapy , Cell Line, Tumor , Drug Resistance, Neoplasm , Female , Gene Expression Regulation, Neoplastic/drug effects , Gene Regulatory Networks/drug effects , Humans , MCF-7 Cells , Organophosphorus Compounds/pharmacology , Organophosphorus Compounds/therapeutic use , Receptor, ErbB-2/genetics , Receptors, Estrogen/genetics , Receptors, Progesterone/genetics , Sequence Analysis, RNA , Sirolimus/pharmacology , Sirolimus/therapeutic use , Tamoxifen/pharmacology , Tamoxifen/therapeutic use , Trastuzumab/pharmacology , Trastuzumab/therapeutic use , Ubiquinone/analogs & derivatives , Ubiquinone/pharmacology , Ubiquinone/therapeutic use
7.
Viruses ; 13(10)2021 09 28.
Article in English | MEDLINE | ID: mdl-34696374

ABSTRACT

Since SARS-CoV-2 appeared in late 2019, many studies on the immune response to COVID-19 have been conducted, but the asymptomatic or light symptom cases were somewhat understudied as respective individuals often did not seek medical help. Here, we analyze the production of the IgG antibodies to viral nucleocapsid (N) protein and receptor-binding domain (RBD) of the spike protein and assess the serum neutralization capabilities in a cohort of patients with different levels of disease severity. In half of light or asymptomatic cases the antibodies to the nucleocapsid protein, which serve as the main target in many modern test systems, were not detected. They were detected in all cases of moderate or severe symptoms, and severe lung lesions correlated with respective higher signals. Antibodies to RBD were present in the absolute majority of samples, with levels being sometimes higher in light symptom cases. We thus suggest that the anti-RBD/anti-N antibody ratio may serve as an indicator of the disease severity. Anti-RBD IgG remained detectable after a year or more since the infection, even with a slight tendency to raise over time, and the respective signal correlated with the serum capacity to inhibit the RBD interaction with the ACE-2 receptor.


Subject(s)
COVID-19/immunology , Coronavirus Nucleocapsid Proteins/immunology , Spike Glycoprotein, Coronavirus/immunology , Adolescent , Adult , Aged , Aged, 80 and over , Antibodies/immunology , Antibodies, Neutralizing/immunology , Antibodies, Viral/blood , Asymptomatic Infections , Female , Humans , Immunoglobulin G/immunology , Immunoglobulin M/blood , Male , Middle Aged , Nucleocapsid , Nucleocapsid Proteins/immunology , Phosphoproteins/immunology , Russia , SARS-CoV-2/immunology
8.
Protein Expr Purif ; 183: 105861, 2021 07.
Article in English | MEDLINE | ID: mdl-33667651

ABSTRACT

Sensitive and specific serology tests are essential for epidemiological and public health studies of COVID-19 and for vaccine efficacy testing. The presence of antibodies to SARS-CoV-2 surface glycoprotein (Spike) and, specifically, its receptor-binding domain (RBD) correlates with inhibition of SARS-CoV-2 binding to the cellular receptor and viral entry into the cells. Serology tests that detect antibodies targeting RBD have high potential to predict COVID-19 immunity and to accurately determine the extent of the vaccine-induced immune response. Cost-effective methods of expression and purification of Spike and its fragments that preserve antigenic properties are essential for development of such tests. Here we describe a method of production of His6-tagged S319-640 fragment containing RBD in E. coli. It includes expression of the fragment, solubilization of inclusion bodies, and on-the-column refolding. The antigenic properties of the resulting product are similar, but not identical to the RBD-containing fragment expressed in human cells.


Subject(s)
COVID-19/virology , SARS-CoV-2/chemistry , Spike Glycoprotein, Coronavirus/chemistry , Binding Sites , Cloning, Molecular , Escherichia coli/chemistry , Escherichia coli/genetics , Gene Expression , Humans , Peptide Fragments/chemistry , Peptide Fragments/genetics , Peptide Fragments/isolation & purification , Protein Domains , Protein Refolding , SARS-CoV-2/genetics , Solubility , Spike Glycoprotein, Coronavirus/genetics , Spike Glycoprotein, Coronavirus/isolation & purification
9.
Diagnostics (Basel) ; 11(1)2021 Jan 11.
Article in English | MEDLINE | ID: mdl-33440690

ABSTRACT

Determining the presence of antibodies in serum is important for epidemiological studies, to be able to confirm whether a person has been infected, predicting risks of them getting sick and spreading the disease. During the ongoing pandemic of COVID-19, a positive serological test result can suggest if it is safe to return to work and re-engage in social activities. Despite a multitude of emerging tests, the quality of respective data often remains ambiguous, yielding a significant fraction of false positive results. The human organism produces polyclonal antibodies specific to multiple viral proteins, so testing simultaneously for multiple antibodies appeared a practical approach for increasing test specificity. We analyzed immune response and testing potential for a spectrum of antigens derived from the spike and nucleocapsid proteins of SARS-CoV-2, developed a dual-antigen testing system in the ELISA format and designed a robust algorithm for data processing. Combining nucleocapsid protein and receptor-binding domain for analysis allowed us to completely eliminate false positive results in the tested cohort (achieving specificity within a 95% confidence interval of 97.2-100%). We also tested samples collected from different households, and demonstrated differences in the immune response of COVID-19 patients and their family members; identifying, in particular, asymptomatic cases showing strong presence of studied antibodies, and cases showing none despite confirmed close contacts with the infected individuals.

10.
Int Immunol ; 32(12): 755-770, 2020 11 23.
Article in English | MEDLINE | ID: mdl-32805738

ABSTRACT

Atypical memory B cells accumulate in chronic infections and autoimmune conditions, and commonly express FCRL4 and FCRL5, respective IgA and IgG receptors. We characterized memory cells from tonsils on the basis of both FCRL4 and FCRL5 expression, defining three subsets with distinct surface proteins and gene expression. Atypical FCRL4+FCRL5+ memory cells had the most discrete surface protein expression and were enriched in cell adhesion pathways, consistent with functioning as tissue-resident cells. Atypical FCRL4-FCRL5+ memory cells expressed transcription factors and immunoglobulin genes that suggest poised differentiation into plasma cells. Accordingly, the FCRL4-FCRL5+ memory subset was enriched in pathways responding to endoplasmic reticulum stress and IFN-γ. We reconstructed ongoing B-cell responses as lineage trees, providing crucial in vivo developmental context. Each memory subset typically maintained its lineage, denoting mechanisms enforcing their phenotypes. Classical FCRL4-FCRL5- memory cells were infrequently detected in lineage trees, suggesting the majority were in a quiescent state. FCRL4-FCRL5+ cells were the most represented memory subset in lineage trees, indicating robust participation in ongoing responses. Together, these differences suggest FCRL4 and FCRL5 are unlikely to be passive markers but rather active drivers of human memory B-cell development and function.


Subject(s)
B-Lymphocytes/immunology , Receptors, Fc/immunology , Cell Line , Humans
11.
Mult Scler ; 21(2): 138-46, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25112814

ABSTRACT

The pathogenesis of multiple sclerosis (MS) involves alterations to multiple pathways and processes, which represent a significant challenge for developing more-effective therapies. Systems biology approaches that study pathway dysregulation should offer benefits by integrating molecular networks and dynamic models with current biological knowledge for understanding disease heterogeneity and response to therapy. In MS, abnormalities have been identified in several cytokine-signaling pathways, as well as those of other immune receptors. Among the downstream molecules implicated are Jak/Stat, NF-Kb, ERK1/3, p38 or Jun/Fos. Together, these data suggest that MS is likely to be associated with abnormalities in apoptosis/cell death, microglia activation, blood-brain barrier functioning, immune responses, cytokine production, and/or oxidative stress, although which pathways contribute to the cascade of damage and can be modulated remains an open question. While current MS drugs target some of these pathways, others remain untouched. Here, we propose a pragmatic systems analysis approach that involves the large-scale extraction of processes and pathways relevant to MS. These data serve as a scaffold on which computational modeling can be performed to identify disease subgroups based on the contribution of different processes. Such an analysis, targeting these relevant MS-signaling pathways, offers the opportunity to accelerate the development of novel individual or combination therapies.


Subject(s)
Multiple Sclerosis/drug therapy , Multiple Sclerosis/metabolism , Signal Transduction/drug effects , Signal Transduction/physiology , Drug Discovery , Humans
12.
PLoS One ; 9(1): e84955, 2014.
Article in English | MEDLINE | ID: mdl-24416320

ABSTRACT

One of the main challenges in modern medicine is to stratify different patient groups in terms of underlying disease molecular mechanisms as to develop more personalized approach to therapy. Here we propose novel method for disease subtyping based on analysis of activated expression regulators on a sample-by-sample basis. Our approach relies on Sub-Network Enrichment Analysis algorithm (SNEA) which identifies gene subnetworks with significant concordant changes in expression between two conditions. Subnetwork consists of central regulator and downstream genes connected by relations extracted from global literature-extracted regulation database. Regulators found in each patient separately are clustered together and assigned activity scores which are used for final patients grouping. We show that our approach performs well compared to other related methods and at the same time provides researchers with complementary level of understanding of pathway-level biology behind a disease by identification of significant expression regulators. We have observed the reasonable grouping of neuromuscular disorders (triggered by structural damage vs triggered by unknown mechanisms), that was not revealed using standard expression profile clustering. For another experiment we were able to suggest the clusters of regulators, responsible for colorectal carcinoma vs adenoma discrimination and identify frequently genetically changed regulators that could be of specific importance for the individual characteristics of cancer development. Proposed approach can be regarded as biologically meaningful feature selection, reducing tens of thousands of genes down to dozens of clusters of regulators. Obtained clusters of regulators make possible to generate valuable biological hypotheses about molecular mechanisms related to a clinical outcome for individual patient.


Subject(s)
Adenoma/genetics , Algorithms , Carcinoma/genetics , Colorectal Neoplasms/genetics , Neuromuscular Diseases/genetics , Adenoma/classification , Adenoma/diagnosis , Carcinoma/classification , Carcinoma/diagnosis , Cluster Analysis , Colorectal Neoplasms/classification , Colorectal Neoplasms/diagnosis , Diagnosis, Differential , Gene Expression Profiling , Gene Expression Regulation , Gene Regulatory Networks , Humans , Multigene Family , Neuromuscular Diseases/classification , Neuromuscular Diseases/diagnosis , Oligonucleotide Array Sequence Analysis , Precision Medicine
13.
Pain ; 154(11): 2335-2343, 2013 Nov.
Article in English | MEDLINE | ID: mdl-23867732

ABSTRACT

Human association studies of common genetic polymorphisms have identified many loci that are associated with risk of complex diseases, although individual loci typically have small effects. However, by envisaging genetic associations in terms of cellular pathways, rather than any specific polymorphism, combined effects of many biologically relevant alleles can be detected. The effects are likely to be most apparent in investigations of phenotypically homogenous subtypes of complex diseases. We report findings from a case-control, genetic association study of relationships between 2925 single nucleotide polymorphisms (SNPs) and 2 subtypes of a commonly occurring chronic facial pain condition, temporomandibular disorder (TMD): 1) localized TMD and 2) TMD with widespread pain. When compared to healthy controls, cases with localized TMD differed in allelic frequency of SNPs that mapped to a serotonergic receptor pathway (P=0.0012), while cases of TMD with widespread pain differed in allelic frequency of SNPs that mapped to a T-cell receptor pathway (P=0.0014). A risk index representing combined effects of 6 SNPs from the serotonergic pathway was associated with greater odds of localized TMD (odds ratio 2.7, P=1.3 E-09), and the result was reproduced in a replication case-control cohort study of 639 people (odds ratio 1.6, P=0.014). A risk index representing combined effects of 8 SNPs from the T-cell receptor pathway was associated with greater odds of TMD with widespread pain (P=1.9 E-08), although the result was not significant in the replication cohort. These findings illustrate potential for clinical classification of chronic pain based on distinct molecular profiles and genetic background.


Subject(s)
Facial Pain/genetics , Facial Pain/physiopathology , Signal Transduction/genetics , Signal Transduction/physiology , Adolescent , Adult , Case-Control Studies , Cohort Studies , DNA/genetics , Female , Genetic Predisposition to Disease , Genotype , Humans , Male , Middle Aged , Models, Genetic , Odds Ratio , Phenotype , Polymorphism, Single Nucleotide/genetics , Receptors, Antigen, T-Cell/physiology , Risk , Serotonin/physiology , Sex Characteristics , Temporomandibular Joint Disorders/genetics , Temporomandibular Joint Disorders/physiopathology , Young Adult
14.
Am J Cancer Res ; 2(1): 93-103, 2012.
Article in English | MEDLINE | ID: mdl-22206048

ABSTRACT

Non-small cell lung cancer (NSCLC) is the most common type of lung cancer, with a poor response to chemotherapy and low survival rate. This unfavorable treatment response is likely to derive from both late diagnosis and from complex, incompletely understood biology, and heterogeneity among NSCLC subtypes. To define the relative contributions of major cellular pathways to the biogenesis of NSCLC and highlight major differences between NSCLC subtypes, we studied the molecular signatures of lung adenocarcinoma (ADC) and squamous cell carcinoma (SCC), based on analysis of gene expression and comparison of tumor samples with normal lung tissue. Our results suggest the existence of specific molecular networks and subtype-specific differences between lung ADC and SCC subtypes, mostly found in cell cycle, DNA repair, and metabolic pathways. However, we also observed similarities across major gene interaction networks and pathways in ADC and SCC. These data provide a new insight into the biology of ADC and SCC and can be used to explore novel therapeutic interventions in lung cancer chemoprevention and treatment.

15.
J Bioinform Comput Biol ; 8(3): 593-606, 2010 Jun.
Article in English | MEDLINE | ID: mdl-20556864

ABSTRACT

Heterogeneous high-throughput biological data become readily available for various diseases. The amount of data points generated by such experiments does not allow manual integration of the information to design the most optimal therapy for a disease. We describe a novel computational workflow for designing therapy using Ariadne Genomics Pathway Studio software. We use publically available microarray experiments for glioblastoma and automatically constructed ResNet and ChemEffect databases to exemplify how to find potentially effective chemicals for glioblastoma--the disease yet without effective treatment. Our first approach involved construction of signaling pathway affected in glioblastoma using scientific literature and data available in ResNet database. Compounds known to affect multiple proteins in this pathway were found in ChemEffect database. Another approach involved analysis of differential expression in glioblastoma patients using Sub-Network Enrichment Analysis (SNEA). SNEA identified angiogenesis-related protein Cyr61 as the major positive regulator upstream of genes differentially expressed in glioblastoma. Using our findings, we then identified breast cancer drug Fulvestrant as a major inhibitor of glioblastoma pathway as well as Cyr61. This suggested Fulvestrant as a potential treatment against glioblastoma. We further show how to increase efficacy of glioblastoma treatment by finding optimal combinations of Fulvestrant with other drugs.


Subject(s)
Antineoplastic Agents/administration & dosage , Combinatorial Chemistry Techniques/methods , Glioblastoma/drug therapy , Glioblastoma/metabolism , Models, Biological , Neoplasm Proteins/metabolism , Signal Transduction/drug effects , Animals , Computer Simulation , Drug Design , Humans
16.
J Bioinform Comput Biol ; 5(2B): 429-56, 2007 Apr.
Article in English | MEDLINE | ID: mdl-17636854

ABSTRACT

Microarray-based characterization of tissues, cellular and disease states, and environmental condition and treatment responses provides genome-wide snapshots containing large amounts of invaluable information. However, the lack of inherent structure within the data and strong noise make extracting and interpreting this information and formulating and prioritizing domain relevant hypotheses difficult tasks. Integration with different types of biological data is required to place the expression measurements into a biologically meaningful context. A few approaches in microarray data interpretation are discussed with the emphasis on the use of molecular network information. Statistical procedures are demonstrated that superimpose expression data onto the transcription regulation network mined from scientific literature and aim at selecting transcription regulators with significant patterns of expression changes downstream. Tests are suggested that take into account network topology and signs of transcription regulation effects. The approaches are illustrated using two different expression datasets, the performance is compared, and biological relevance of the predictions is discussed.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Models, Biological , Oligonucleotide Array Sequence Analysis/methods , Proteome/metabolism , Signal Transduction/physiology , Transcription, Genetic/physiology , Computer Simulation
17.
BMC Bioinformatics ; 8: 243, 2007 Jul 10.
Article in English | MEDLINE | ID: mdl-17620146

ABSTRACT

BACKGROUND: Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. RESULTS: We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP) technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO) annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology. An increase in the number and size of GO groups without any noticeable decrease of the link density within the groups indicated that this expansion significantly broadens the public GO annotation without diluting its quality. We revealed that functional GO annotation correlates mostly with clustering in a physical interaction protein network, while its overlap with indirect regulatory network communities is two to three times smaller. CONCLUSION: Protein functional annotations extracted by the NLP technology expand and enrich the existing GO annotation system. The GO functional modularity correlates mostly with the clustering in the physical interaction network, suggesting that the essential role of structural organization maintained by these interactions. Reciprocally, clustering of proteins in physical interaction networks can serve as an evidence for their functional similarity.


Subject(s)
Computational Biology/methods , Databases, Genetic/classification , Genes , Pattern Recognition, Automated/methods , Proteins/physiology , Cluster Analysis , Computational Biology/standards , Databases, Genetic/standards , Databases, Protein , Information Storage and Retrieval , Natural Language Processing , Pattern Recognition, Automated/standards , Protein Interaction Mapping , PubMed , Reproducibility of Results , Terminology as Topic
18.
J Biomed Sci ; 14(3): 395-405, 2007 May.
Article in English | MEDLINE | ID: mdl-17385060

ABSTRACT

Alterations in eIF3-p48/INT6 gene expression have been implicated in murine and human mammary carcinogenesis. We examined levels of INT6 protein in human tumors and determined that breast and colon tumors clustered into distinct groups based on levels of INT6 expression and clinicopathological variables. We performed multiplex tissue immunoblotting of breast, colon, lung, and ovarian tumor tissues and found that INT6 protein levels positively correlated with those of TID1, Patched, p53, c-Jun, and phosphorylated-c-Jun proteins in a tissue-specific manner. INT6 and TID1 showed significant positive correlation in all tissue types tested. These findings were confirmed by immunohistochemical staining of INT6 and TID1. Further evidence supporting a cooperative role for INT6 and TID1 is the presence of endogenous INT6 and TID1 proteins as complexes. We detected co-immunoprecipitation between INT6 and TID1, as well as between INT6 and Patched. These findings suggest potential integrated roles for INT6, TID1, and Patched proteins in cell growth, development, and tumorigenesis. Additionally, these data suggest that the combination of INT6, TID1, and Patched protein levels may be useful biomarkers for the development of diagnostic assays.


Subject(s)
Biomarkers, Tumor/metabolism , Eukaryotic Initiation Factor-3/metabolism , Gene Expression Regulation, Neoplastic , HSP40 Heat-Shock Proteins/metabolism , Neoplasms/metabolism , Receptors, Cell Surface/metabolism , Biomarkers, Tumor/genetics , Humans , Neoplasms/genetics , Neoplasms/pathology , Patched Receptors , Proto-Oncogene Proteins/genetics , Proto-Oncogene Proteins/metabolism
19.
BMC Bioinformatics ; 7: 171, 2006 Mar 24.
Article in English | MEDLINE | ID: mdl-16563163

ABSTRACT

BACKGROUND: Scientific literature is a source of the most reliable and comprehensive knowledge about molecular interaction networks. Formalization of this knowledge is necessary for computational analysis and is achieved by automatic fact extraction using various text-mining algorithms. Most of these techniques suffer from high false positive rates and redundancy of the extracted information. The extracted facts form a large network with no pathways defined. RESULTS: We describe the methodology for automatic curation of Biological Association Networks (BANs) derived by a natural language processing technology called Medscan. The curated data is used for automatic pathway reconstruction. The algorithm for the reconstruction of signaling pathways is also described and validated by comparison with manually curated pathways and tissue-specific gene expression profiles. CONCLUSION: Biological Association Networks extracted by MedScan technology contain sufficient information for constructing thousands of mammalian signaling pathways for multiple tissues. The automatically curated MedScan data is adequate for automatic generation of good quality signaling networks. The automatically generated Regulome pathways and manually curated pathways used for their validation are available free in the ResNetCore database from Ariadne Genomics, Inc. 1. The pathways can be viewed and analyzed through the use of a free demo version of PathwayStudio software. The Medscan technology is also available for evaluation using the free demo version of PathwayStudio software.


Subject(s)
Databases, Bibliographic , Natural Language Processing , Periodicals as Topic , Protein Interaction Mapping/methods , Proteins/classification , Proteins/metabolism , Signal Transduction/physiology , Information Storage and Retrieval/methods , Software
20.
Nucleic Acids Res ; 33(11): 3629-35, 2005.
Article in English | MEDLINE | ID: mdl-15983135

ABSTRACT

We demonstrate that protein-protein interaction networks in several eukaryotic organisms contain significantly more self-interacting proteins than expected if such homodimers randomly appeared in the course of the evolution. We also show that on average homodimers have twice as many interaction partners than non-self-interacting proteins. More specifically, the likelihood of a protein to physically interact with itself was found to be proportional to the total number of its binding partners. These properties of dimers are in agreement with a phenomenological model, in which individual proteins differ from each other by the degree of their 'stickiness' or general propensity toward interaction with other proteins including oneself. A duplication of self-interacting proteins creates a pair of paralogous proteins interacting with each other. We show that such pairs occur more frequently than could be explained by pure chance alone. Similar to homodimers, proteins involved in heterodimers with their paralogs on average have twice as many interacting partners than the rest of the network. The likelihood of a pair of paralogous proteins to interact with each other was also shown to decrease with their sequence similarity. This points to the conclusion that most of interactions between paralogs are inherited from ancestral homodimeric proteins, rather than established de novo after duplication. We finally discuss possible implications of our empirical observations from functional and evolutionary standpoints.


Subject(s)
Biological Evolution , Multiprotein Complexes/metabolism , Animals , Dimerization , Humans , Multiprotein Complexes/chemistry , Protein Binding , Two-Hybrid System Techniques
SELECTION OF CITATIONS
SEARCH DETAIL