ABSTRACT
COVID-19 is characterized by dysregulated immune responses, metabolic dysfunction and adverse effects on the function of multiple organs. To understand host responses to COVID-19 pathophysiology, we combined transcriptomics, proteomics, and metabolomics to identify molecular markers in peripheral blood and plasma samples of 66 COVID-19-infected patients experiencing a range of disease severities and 17 healthy controls. A large number of expressed genes, proteins, metabolites, and extracellular RNAs (exRNAs) exhibit strong associations with various clinical parameters. Multiple sets of tissue-specific proteins and exRNAs varied significantly in both mild and severe patients suggesting a potential impact on tissue function. Chronic activation of neutrophils, IFN-I signaling, and a high level of inflammatory cytokines were observed in patients with severe disease progression. In contrast, COVID-19-infected patients experiencing milder disease symptoms showed robust T-cell responses. Finally, we identified genes, proteins, and exRNAs as potential biomarkers that might assist in predicting the prognosis of SARS-CoV-2 infection. These data refine our understanding of the pathophysiology and clinical progress of COVID-19.
Subject(s)
COVID-19/blood , COVID-19/pathology , Biomarkers/blood , COVID-19/immunology , COVID-19/virology , Female , Genomics/methods , Humans , Lipoproteins/metabolism , Male , Metabolomics/methods , SARS-CoV-2/physiology , Severity of Illness Index , Viral LoadABSTRACT
INTRODUCTION: Frailty has become a worldwide health burden that has a large influence on public health and clinical practice. The incidence of frailty is anticipated to increase as the ageing population increases. Myocardial injury after noncardiac surgery (MINS) is associated with short-term and long-term mortality. However, the incidence of MINS in frail geriatric patients is unknown. METHODS AND ANALYSIS: This prospective, multicentre, real-world observational cohort study will be conducted at 18 designated centres in China from January 2023 to December 2024, with an anticipated sample size of 856 patients aged 65 years and older who are scheduled to undergo noncardiac surgery. The primary outcome will be the incidence of MINS. MINS is defined as a fourth-generation plasma cardiac troponin T (cTnT) concentration ≥ 0.03 ng/mL exhibited at least once within 30 days after surgery, with or without symptoms of myocardial ischaemia. All data will be collected via electronic data acquisition. DISCUSSION: This study will explore the incidence of MINS in frail patients. The characteristics, predictive factors and 30-day outcomes of MINS in frail patients will be further investigated to lay the foundation for identifying clinical interventions. CLINICAL TRIAL REGISTRATION: https://beta. CLINICALTRIALS: gov/study/NCT05635877 , NCT05635877.
Subject(s)
Frailty , Myocardial Ischemia , Humans , Aged , Postoperative Complications/diagnosis , Postoperative Complications/epidemiology , Postoperative Complications/etiology , Prospective Studies , Frailty/diagnosis , Frailty/epidemiology , Frailty/complications , Myocardial Ischemia/diagnosis , Myocardial Ischemia/epidemiology , Myocardial Ischemia/etiology , Cohort Studies , Risk Factors , Observational Studies as Topic , Multicenter Studies as TopicABSTRACT
OBJECTIVE: This phase 2 study investigated sapanisertib (selective dual inhibitor of mTORC1/2) alone, or in combination with paclitaxel or TAK-117 (a selective small molecule inhibitor of PI3K), versus paclitaxel alone in advanced, recurrent, or persistent endometrial cancer. METHODS: Patients with histologic diagnosis of endometrial cancer (1-2 prior regimens) were randomized to 28-day cycles on four treatment arms: 1) weekly paclitaxel 80 mg/m2 (days 1, 8, and 15); 2) weekly paclitaxel 80 mg/m2 + oral sapanisertib 4 mg on days 2-4, 9-11, 16-18, and 23-25; 3) weekly sapanisertib 30 mg, or 4) sapanisertib 4 mg + TAK-117 200 mg on days 1-3, 8-10, 15-17, and 22-24. RESULTS: Of 241 patients randomized, 234 received treatment (paclitaxel, n = 87 [3 ongoing]; paclitaxel+sapanisertib, n = 86 [3 ongoing]; sapanisertib, n = 41; sapanisertib+TAK-117, n = 20). The sapanisertib and sapanisertib+TAK-117 arms were closed to enrollment after futility analyses. After a median follow-up of 14.4 (paclitaxel) versus 17.2 (paclitaxel+sapanisertib) months, median progression-free survival (PFS; primary endpoint) was 3.7 versus 5.6 months (hazard ratio [HR] 0.82; 95% confidence interval [CI] 0.58-1.15; p = 0.139); in patients with endometrioid histology (n = 116), median PFS was 3.3 versus 5.7 months (HR 0.66; 95% CI 0.43-1.03). Grade ≥ 3 treatment-emergent adverse event rates were 54.0% with paclitaxel versus 89.5% paclitaxel+sapanisertib. CONCLUSIONS: Our findings support inclusion of chemotherapy combinations with investigational agents for advanced or metastatic disease. The primary endpoint was not met and toxicity was manageable. TRIAL REGISTRATION: ClinicalTrials.gov number, NCT02725268.
Subject(s)
Endometrial Neoplasms , Paclitaxel , Humans , Female , Paclitaxel/adverse effects , Treatment Outcome , Endometrial Neoplasms/drug therapy , Endometrial Neoplasms/etiology , Progression-Free Survival , Antineoplastic Combined Chemotherapy Protocols/adverse effectsABSTRACT
BACKGROUND: Aristolochic Acid (AA), a natural component of Aristolochia plants that is found in a variety of herbal remedies and health supplements, is classified as a Group 1 carcinogen by the International Agency for Research on Cancer. Given that microRNAs (miRNAs) are involved in cancer initiation and progression and their role remains unknown in AA-induced carcinogenesis, we examined genome-wide AA-induced dysregulation of miRNAs as well as the regulation of miRNAs on their target gene expression in rat kidney. RESULTS: We treated rats with 10 mg/kg AA and vehicle control for 12 weeks and eight kidney samples (4 for the treatment and 4 for the control) were used for examining miRNA and mRNA expression by deep sequencing, and protein expression by proteomics. AA treatment resulted in significant differential expression of miRNAs, mRNAs and proteins as measured by both principal component analysis (PCA) and hierarchical clustering analysis (HCA). Specially, 63 miRNAs (adjusted p value < 0.05 and fold change > 1.5), 6,794 mRNAs (adjusted p value < 0.05 and fold change > 2.0), and 800 proteins (fold change > 2.0) were significantly altered by AA treatment. The expression of 6 selected miRNAs was validated by quantitative real-time PCR analysis. Ingenuity Pathways Analysis (IPA) showed that cancer is the top network and disease associated with those dysregulated miRNAs. To further investigate the influence of miRNAs on kidney mRNA and protein expression, we combined proteomic and transcriptomic data in conjunction with miRNA target selection as confirmed and reported in miRTarBase. In addition to translational repression and transcriptional destabilization, we also found that miRNAs and their target genes were expressed in the same direction at levels of transcription (169) or translation (227). Furthermore, we identified that up-regulation of 13 oncogenic miRNAs was associated with translational activation of 45 out of 54 cancer-related targets. CONCLUSIONS: Our findings suggest that dysregulated miRNA expression plays an important role in AA-induced carcinogenesis in rat kidney, and that the integrated approach of multiple profiling provides a new insight into a post-transcriptional regulation of miRNAs on their target repression and activation in a genome-wide scale.
Subject(s)
Aristolochic Acids/toxicity , Carcinogens/toxicity , Kidney Neoplasms/genetics , Kidney Neoplasms/metabolism , Kidney/drug effects , RNA, Neoplasm/analysis , Animals , Gene Expression Regulation, Neoplastic/drug effects , Gene Regulatory Networks/drug effects , High-Throughput Nucleotide Sequencing , Kidney/metabolism , Kidney Neoplasms/etiology , Male , MicroRNAs/analysis , Molecular Sequence Data , Proteomics , RNA, Messenger/analysis , Rats , Sequence Analysis, RNAABSTRACT
BACKGROUND: Due to a significant decline in the costs associated with next-generation sequencing, it has become possible to decipher the genetic architecture of a population by sequencing a large number of individuals to a deep coverage. The Korean Personal Genomes Project (KPGP) recently sequenced 35 Korean genomes at high coverage using the Illumina Hiseq platform and made the deep sequencing data publicly available, providing the scientific community opportunities to decipher the genetic architecture of the Korean population. METHODS: In this study, we used two single nucleotide variant (SNV) calling pipelines: mapping the raw reads obtained from whole genome sequencing of 35 Korean individuals in KPGP using BWA and SOAP2 followed by SNV calling using SAMtools and SOAPsnp, respectively. The consensus SNVs obtained from the two SNV pipelines were used to represent the SNVs of the Korean population. We compared these SNVs to those from 17 other populations provided by the HapMap consortium and the 1000 Genomes Project (1KGP) and identified SNVs that were only present in the Korean population. We studied the mutation spectrum and analyzed the genes of non-synonymous SNVs only detected in the Korean population. RESULTS: We detected a total of 8,555,726 SNVs in the 35 Korean individuals and identified 1,213,613 SNVs detected in at least one Korean individual (SNV-1) and 12,640 in all of 35 Korean individuals (SNV-35) but not in 17 other populations. In contrast with the SNVs common to other populations in HapMap and 1KGP, the Korean only SNVs had high percentages of non-silent variants, emphasizing the unique roles of these Korean only SNVs in the Korean population. Specifically, we identified 8,361 non-synonymous Korean only SNVs, of which 58 SNVs existed in all 35 Korean individuals. The 5,754 genes of non-synonymous Korean only SNVs were highly enriched in some metabolic pathways. We found adhesion is the top disease term associated with SNV-1 and Nelson syndrome is the only disease term associated with SNV-35. We found that a significant number of Korean only SNVs are in genes that are associated with the drug term of adenosine. CONCLUSION: We identified the SNVs that were found in the Korean population but not seen in other populations, and explored the corresponding genes and pathways as well as the associated disease terms and drug terms. The results expand our knowledge of the genetic architecture of the Korean population, which will benefit the implementation of personalized medicine for the Korean population.
Subject(s)
Asian People/genetics , Polymorphism, Single Nucleotide , Disease/genetics , Gene Ontology , Genetic Association Studies , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Korea , Mutation , Sequence Alignment , Sequence Analysis, DNA , SoftwareABSTRACT
The aim of this review is to comprehensively summarize the recent achievements in the field of toxicogenomics and cancer research regarding genetic-environmental interactions in carcinogenesis and detection of genetic aberrations in cancer genomes by next-generation sequencing technology. Cancer is primarily a genetic disease in which genetic factors and environmental stimuli interact to cause genetic and epigenetic aberrations in human cells. Mutations in the germline act as either high-penetrance alleles that strongly increase the risk of cancer development, or as low-penetrance alleles that mildly change an individual's susceptibility to cancer. Somatic mutations, resulting from either DNA damage induced by exposure to environmental mutagens or from spontaneous errors in DNA replication or repair are involved in the development or progression of the cancer. Induced or spontaneous changes in the epigenome may also drive carcinogenesis. Advances in next-generation sequencing technology provide us opportunities to accurately, economically, and rapidly identify genetic variants, somatic mutations, gene expression profiles, and epigenetic alterations with single-base resolution. Whole genome sequencing, whole exome sequencing, and RNA sequencing of paired cancer and adjacent normal tissue present a comprehensive picture of the cancer genome. These new findings should benefit public health by providing insights in understanding cancer biology, and in improving cancer diagnosis and therapy.
Subject(s)
High-Throughput Nucleotide Sequencing , Neoplasms/genetics , Toxicogenetics/methods , Disease Susceptibility , High-Throughput Nucleotide Sequencing/economics , Humans , Toxicogenetics/economicsABSTRACT
INTRODUCTION: Acid α-glucosidase (GAA) is a lysosomal enzyme that hydrolyzes glycogen to glucose. Deficiency of GAA causes Pompe disease (PD), also known as glycogen storage disease type II. The resulting glycogen accumulation causes a spectrum of disease severity ranging from infantile-onset PD to adult-onset PD. Additional non-invasive biomarkers of disease severity are needed to monitor response to therapeutic interventions. METHODS: We measured protein and miRNA abundance in exosomes from serum and urine from the PD mouse model (B6;129-GaaTm1Rabn/J), wild-type mice, and PD mice treated with a candidate gene therapy. RESULTS: There were significant differences in the abundance of 113 miRNA in serum exosomes from Pompe versus healthy mice. Levels of miR-206, miR-133, miR-1a, miR-486, and other important regulators of muscle development and maintenance were altered in the Pompe samples. The serum and urine exosome proteomes of healthy and Pompe mice also differed broadly. Several of the dysregulated proteins are encoded by genes with potential target sites for affected miRNA. CONCLUSION: Exosomes derived from urine or serum are a potential source of biomarkers for Pompe Disease. Further study of the differences in the miRNA transcriptome and proteome content of exosomes may yield new insights into disease mechanisms.
Subject(s)
Disease Models, Animal , Glycogen Storage Disease Type II , MicroRNAs , Proteome , Transcriptome , Animals , MicroRNAs/blood , MicroRNAs/urine , Glycogen Storage Disease Type II/genetics , Glycogen Storage Disease Type II/blood , Glycogen Storage Disease Type II/urine , Mice , Proteome/metabolism , Extracellular Vesicles/metabolism , Exosomes/metabolism , Exosomes/genetics , Biomarkers/urine , Biomarkers/blood , Male , alpha-Glucosidases/genetics , alpha-Glucosidases/urine , alpha-Glucosidases/blood , alpha-Glucosidases/metabolism , Genetic Therapy/methodsABSTRACT
BACKGROUND: Large amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity. RESULTS: atBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks). The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis. CONCLUSION: atBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm285284.htm.
Subject(s)
Biomarkers/metabolism , Genomics , Software , Algorithms , Cluster Analysis , Databases, Protein , Humans , Metabolic Networks and Pathways , Protein Interaction Maps , User-Computer InterfaceABSTRACT
The Liver Toxicity Biomarker Study is a systems toxicology approach to discover biomarkers that are indicative of a drug's potential to cause human idiosyncratic drug-induced liver injury. In phase I, the molecular effects in rat liver and blood plasma induced by tolcapone (a "toxic" drug) were compared with the molecular effects in the same tissues by dosing with entacapone (a "clean" drug, similar to tolcapone in chemical structure and primary pharmacological mechanism). Two durations of drug exposure, 3 and 28 days, were employed. Comprehensive molecular analysis of rat liver and plasma samples yielded marker analytes for various drug-vehicle or drug-drug comparisons. An important finding was that the marker analytes associated with tolcapone only partially overlapped with marker analytes associated with entacapone, despite the fact that both drugs have similar chemical structures and the same primary pharmacological mechanism of action. This result indicates that the molecular analyses employed in the study are detecting substantial "off-target" markers for the two drugs. An additional interesting finding was the modest overlap of the marker data sets for 3-day exposure and 28-day exposure, indicating that the molecular changes in liver and plasma caused by short- and long-term drug treatments do not share common characteristics.
Subject(s)
Benzophenones/toxicity , Catechols/toxicity , Chemical and Drug Induced Liver Injury/metabolism , Nitriles/toxicity , Nitrophenols/toxicity , Animals , Biomarkers/analysis , Blood Proteins/analysis , Chemical and Drug Induced Liver Injury/blood , Female , Gene Expression Profiling , Liver/chemistry , Liver/metabolism , Male , Metabolome/drug effects , Metabolomics , Proteome/analysis , Proteome/drug effects , Proteomics , Rats , Research Design , Tolcapone , Toxicity Tests, Acute/methods , Toxicity Tests, Chronic/methodsABSTRACT
BACKGROUND: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. RESULTS: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. CONCLUSIONS: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.
Subject(s)
Genome, Human , Polymorphism, Single Nucleotide , High-Throughput Nucleotide Sequencing , Humans , INDEL Mutation , Reproducibility of Results , Whole Genome SequencingABSTRACT
BACKGROUND: Protein-protein interactions (PPIs) are a critical component for many underlying biological processes. A PPI network can provide insight into the mechanisms of these processes, as well as the relationships among different proteins and toxicants that are potentially involved in the processes. There are many PPI databases publicly available, each with a specific focus. The challenge is how to effectively combine their contents to generate a robust and biologically relevant PPI network. METHODS: In this study, seven public PPI databases, BioGRID, DIP, HPRD, IntAct, MINT, REACTOME, and SPIKE, were used to explore a powerful approach to combine multiple PPI databases for an integrated PPI network. We developed a novel method called k-votes to create seven different integrated networks by using values of k ranging from 1-7. Functional modules were mined by using SCAN, a Structural Clustering Algorithm for Networks. Overall module qualities were evaluated for each integrated network using the following statistical and biological measures: (1) modularity, (2) similarity-based modularity, (3) clustering score, and (4) enrichment. RESULTS: Each integrated human PPI network was constructed based on the number of votes (k) for a particular interaction from the committee of the original seven PPI databases. The performance of functional modules obtained by SCAN from each integrated network was evaluated. The optimal value for k was determined by the functional module analysis. Our results demonstrate that the k-votes method outperforms the traditional union approach in terms of both statistical significance and biological meaning. The best network is achieved at k = 2, which is composed of interactions that are confirmed in at least two PPI databases. In contrast, the traditional union approach yields an integrated network that consists of all interactions of seven PPI databases, which might be subject to high false positives. CONCLUSIONS: We determined that the k-votes method for constructing a robust PPI network by integrating multiple public databases outperforms previously reported approaches and that a value of k=2 provides the best results. The developed strategies for combining databases show promise in the advancement of network construction and modeling.
Subject(s)
Algorithms , Databases, Protein , Protein Interaction Maps , Humans , Models, Biological , Protein Interaction Mapping/methods , Proteins/chemistry , Proteins/metabolismABSTRACT
RNA-Seq has been increasingly used for the quantification and characterization of transcriptomes. The ongoing development of the technology promises the more accurate measurement of gene expression. However, its benefits over widely accepted microarray technologies have not been adequately assessed, especially in toxicogenomics studies. The goal of this study is to enhance the scientific community's understanding of the advantages and challenges of RNA-Seq in the quantification of gene expression by comparing analysis results from RNA-Seq and microarray data on a toxicogenomics study. A typical toxicogenomics study design was used to compare the performance of an RNA-Seq approach (Illumina Genome Analyzer II) to a microarray-based approach (Affymetrix Rat Genome 230 2.0 arrays) for detecting differentially expressed genes (DEGs) in the kidneys of rats treated with aristolochic acid (AA), a carcinogenic and nephrotoxic chemical most notably used for weight loss. We studied the comparability of the RNA-Seq and microarray data in terms of absolute gene expression, gene expression patterns, differentially expressed genes, and biological interpretation. We found that RNA-Seq was more sensitive in detecting genes with low expression levels, while similar gene expression patterns were observed for both platforms. Moreover, although the overlap of the DEGs was only 40-50%, the biological interpretation was largely consistent between the RNA-Seq and microarray data. RNA-Seq maintained a consistent biological interpretation with time-tested microarray platforms while generating more sensitive results. However, there is clearly a need for future investigations to better understand the advantages and limitations of RNA-Seq in toxicogenomics studies and environmental health research.
Subject(s)
Aristolochic Acids/toxicity , Carcinogens/toxicity , Gene Expression Profiling/methods , Kidney/drug effects , Oligonucleotide Array Sequence Analysis/methods , Sequence Analysis, RNA/methods , Animals , Carcinogenicity Tests/methods , Gene Expression Regulation/drug effects , Kidney/metabolism , Rats , Toxicogenetics/methodsABSTRACT
BACKGROUND: The Affymetrix GeneChip® system is a commonly used platform for microarray analysis but the technology is inherently expensive. Unfortunately, changes in experimental planning and execution, such as the unavailability of previously anticipated samples or a shift in research focus, may render significant numbers of pre-purchased GeneChip® microarrays unprocessed before their manufacturer's expiration dates. Researchers and microarray core facilities wonder whether expired microarrays are still useful for gene expression analysis. In addition, it was not clear whether the two human reference RNA samples established by the MAQC project in 2005 still maintained their transcriptome integrity over a period of four years. Experiments were conducted to answer these questions. RESULTS: Microarray data were generated in 2009 in three replicates for each of the two MAQC samples with either expired Affymetrix U133A or unexpired U133Plus2 microarrays. These results were compared with data obtained in 2005 on the U133Plus2 microarray. The percentage of overlap between the lists of differentially expressed genes (DEGs) from U133Plus2 microarray data generated in 2009 and in 2005 was 97.44%. While there was some degree of fold change compression in the expired U133A microarrays, the percentage of overlap between the lists of DEGs from the expired and unexpired microarrays was as high as 96.99%. Moreover, the microarray data generated using the expired U133A microarrays in 2009 were highly concordant with microarray and TaqMan® data generated by the MAQC project in 2005. CONCLUSIONS: Our results demonstrated that microarray data generated using U133A microarrays, which were more than four years past the manufacturer's expiration date, were highly specific and consistent with those from unexpired microarrays in identifying DEGs despite some appreciable fold change compression and decrease in sensitivity. Our data also suggested that the MAQC reference RNA samples, stored at -80°C, were stable over a time frame of at least four years.
Subject(s)
Gene Expression Profiling/standards , Oligonucleotide Array Sequence Analysis/standards , RNA/standards , Databases, Genetic , Humans , Oligonucleotide Array Sequence Analysis/methods , Quality Control , RNA/chemistry , Reference Standards , Sensitivity and Specificity , Sequence Analysis, RNA/methodsABSTRACT
BACKGROUND: Advances in microbial genomics and bioinformatics are offering greater insights into the emergence and spread of foodborne pathogens in outbreak scenarios. The Food and Drug Administration (FDA) has developed a genomics tool, ArrayTrack™, which provides extensive functionalities to manage, analyze, and interpret genomic data for mammalian species. ArrayTrack™ has been widely adopted by the research community and used for pharmacogenomics data review in the FDA's Voluntary Genomics Data Submission program. RESULTS: ArrayTrack™ has been extended to manage and analyze genomics data from bacterial pathogens of human, animal, and food origin. It was populated with bioinformatics data from public databases such as NCBI, Swiss-Prot, KEGG Pathway, and Gene Ontology to facilitate pathogen detection and characterization. ArrayTrack™'s data processing and visualization tools were enhanced with analysis capabilities designed specifically for microbial genomics including flag-based hierarchical clustering analysis (HCA), flag concordance heat maps, and mixed scatter plots. These specific functionalities were evaluated on data generated from a custom Affymetrix array (FDA-ECSG) previously developed within the FDA. The FDA-ECSG array represents 32 complete genomes of Escherichia coli and Shigella. The new functions were also used to analyze microarray data focusing on antimicrobial resistance genes from Salmonella isolates in a poultry production environment using a universal antimicrobial resistance microarray developed by the United States Department of Agriculture (USDA). CONCLUSION: The application of ArrayTrack™ to different microarray platforms demonstrates its utility in microbial genomics research, and thus will improve the capabilities of the FDA to rapidly identify foodborne bacteria and their genetic traits (e.g., antimicrobial resistance, virulence, etc.) during outbreak investigations. ArrayTrack™ is free to use and available to public, private, and academic researchers at http://www.fda.gov/ArrayTrack.
Subject(s)
Genome, Bacterial , Genomics/methods , Oligonucleotide Array Sequence Analysis/methods , Databases, Genetic , Databases, Protein , Foodborne Diseases/microbiology , Humans , Software , United States , United States Food and Drug AdministrationABSTRACT
BACKGROUND: Several different microarray platforms are available for measuring gene expression. There are disagreements within the microarray scientific community for intra- and inter-platform consistency of these platforms. Both high and low consistencies were demonstrated across different platforms in terms of genes with significantly differential expression. Array studies for gene expression are used to explore biological causes and effects. Therefore, consistency should eventually be evaluated in a biological setting to reveal the functional differences between the examined samples, not just a list of differentially expressed genes (DEG). In this study, we investigated whether different platforms had a high consistency from the biologically functional perspective. RESULTS: DEG data without filtering the different probes in microarrays from different platforms generated from kidney samples of rats treated with the kidney carcinogen, aristolochic acid, in five test sites using microarrays from Affymetrix, Applied Biosystems, Agilent, and GE health platforms (two sites using Affymetrix for intra-platform comparison) were input into the Ingenuity Pathway Analysis (IPA) system for functional analysis. The functions of the DEG lists determined by IPA were compared across the four different platforms and two test sites for Affymetrix platform. Analysis results showed that there is a very high level of consistency between the two test sites using the same platform or among different platforms. The top functions determined by the different platforms were very similar and reflected carcinogenicity and toxicity of aristolochic acid in the rat kidney. CONCLUSION: Our results demonstrate that highly consistent biological information can be generated from different microarray platforms.
Subject(s)
Computational Biology/methods , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Animals , Aristolochic Acids/toxicity , Databases, Genetic , RatsABSTRACT
Drug-induced liver injury (DILI) is the primary adverse event that results in withdrawal of drugs from the market and a frequent reason for the failure of drug candidates in development. The Liver Toxicity Biomarker Study (LTBS) is an innovative approach to investigate DILI because it compares molecular events produced in vivo by compound pairs that (a) are similar in structure and mechanism of action, (b) are associated with few or no signs of liver toxicity in preclinical studies, and (c) show marked differences in hepatotoxic potential. The LTBS is a collaborative preclinical research effort in molecular systems toxicology between the National Center for Toxicological Research and BG Medicine, Inc., and is supported by seven pharmaceutical companies and three technology providers. In phase I of the LTBS, entacapone and tolcapone were studied in rats to provide results and information that will form the foundation for the design and implementation of phase II. Molecular analysis of the rat liver and plasma samples combined with statistical analyses of the resulting datasets yielded marker analytes, illustrating the value of the broad-spectrum, molecular systems analysis approach to studying pharmacological or toxicological effects.
Subject(s)
Antiparkinson Agents/toxicity , Benzophenones/toxicity , Biomarkers/metabolism , Catechols/toxicity , Chemical and Drug Induced Liver Injury/metabolism , Liver/metabolism , Nitriles/toxicity , Nitrophenols/toxicity , Animals , Antiparkinson Agents/pharmacokinetics , Chemical and Drug Induced Liver Injury/etiology , Dose-Response Relationship, Drug , Female , Gene Expression/drug effects , Liver/drug effects , Male , Metabolomics , Oligonucleotide Array Sequence Analysis , Proteomics , Rats , Rats, Sprague-Dawley , TolcaponeABSTRACT
BACKGROUND: Advances in DNA microarray technology portend that molecular signatures from which microarray will eventually be used in clinical environments and personalized medicine. Derivation of biomarkers is a large step beyond hypothesis generation and imposes considerably more stringency for accuracy in identifying informative gene subsets to differentiate phenotypes. The inherent nature of microarray data, with fewer samples and replicates compared to the large number of genes, requires identifying informative genes prior to classifier construction. However, improving the ability to identify differentiating genes remains a challenge in bioinformatics. RESULTS: A new hybrid gene selection approach was investigated and tested with nine publicly available microarray datasets. The new method identifies a Very Important Pool (VIP) of genes from the broad patterns of gene expression data. The method uses a bagging sampling principle, where the re-sampled arrays are used to identify the most informative genes. Frequency of selection is used in a repetitive process to identify the VIP genes. The putative informative genes are selected using two methods, t-statistic and discriminatory analysis. In the t-statistic, the informative genes are identified based on p-values. In the discriminatory analysis, disjoint Principal Component Analyses (PCAs) are conducted for each class of samples, and genes with high discrimination power (DP) are identified. The VIP gene selection approach was compared with the p-value ranking approach. The genes identified by the VIP method but not by the p-value ranking approach are also related to the disease investigated. More importantly, these genes are part of the pathways derived from the common genes shared by both the VIP and p-ranking methods. Moreover, the binary classifiers built from these genes are statistically equivalent to those built from the top 50 p-value ranked genes in distinguishing different types of samples. CONCLUSION: The VIP gene selection approach could identify additional subsets of informative genes that would not always be selected by the p-value ranking method. These genes are likely to be additional true positives since they are a part of pathways identified by the p-value ranking method and expected to be related to the relevant biology. Therefore, these additional genes derived from the VIP method potentially provide valuable biological insights.
Subject(s)
Algorithms , Gene Expression Profiling/methods , Gene Pool , Genes/genetics , Genetic Markers/genetics , Oligonucleotide Array Sequence Analysis/methods , Proteome/geneticsABSTRACT
BACKGROUND: Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set. RESULTS: Using data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls. CONCLUSION: Batch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch.
Subject(s)
Algorithms , Chromosome Mapping/methods , Genome, Human/genetics , Haplotypes , Oligonucleotide Array Sequence Analysis/methods , Polymorphism, Single Nucleotide/genetics , Software , Base Sequence , DNA Mutational Analysis/methods , Genotype , Humans , Molecular Sequence DataABSTRACT
BACKGROUND: Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists. RESULTS: Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan - the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations. CONCLUSION: We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.
Subject(s)
Algorithms , Data Interpretation, Statistical , Gene Expression Profiling/methods , Genes/genetics , Oligonucleotide Array Sequence Analysis/methods , Computer Simulation , Models, Genetic , Models, Statistical , Reproducibility of Results , Sensitivity and SpecificityABSTRACT
Measurement of circulating insulin-like growth factors (IGFs), in particular IGF-binding protein (IGFBP)-2, at the time of diagnosis, is independently prognostic in many cancers, but its clinical performance against other routinely determined prognosticators has not been examined. We measured IGF-I, IGF-II, pro-IGF-II, IGF bioactivity, IGFBP-2, -3, and pregnancy-associated plasma protein A (PAPP-A), an IGFBP regulator, in baseline samples of 301 women with breast cancer treated on four protocols (Odense, Denmark: 1993-1998). We evaluated performance characteristics (expressed as area under the curve, AUC) using Cox regression models to derive hazard ratios (HR) with 95% confidence intervals (CIs) for 10-year recurrence-free survival (RFS) and overall survival (OS), and compared those against the clinically used Nottingham Prognostic Index (NPI). We measured the same biomarkers in 531 noncancer individuals to assess multidimensional relationships (MDR), and evaluated additional prognostic models using survival artificial neural network (SANN) and survival support vector machines (SSVM), as these enhance capture of MDRs. For RFS, increasing concentrations of circulating IGFBP-2 and PAPP-A were independently prognostic [HRbiomarker doubling : 1.474 (95% CIs: 1.160, 1.875, P = 0.002) and 1.952 (95% CIs: 1.364, 2.792, P < 0.001), respectively]. The AUCRFS for NPI was 0.626 (Cox model), improving to 0.694 (P = 0.012) with the addition of IGFBP-2 plus PAPP-A. Derived AUCRFS using SANN and SSVM did not perform superiorly. Similar patterns were observed for OS. These findings illustrate an important principle in biomarker qualification-measured circulating biomarkers may demonstrate independent prognostication, but this does not necessarily translate into substantial improvement in clinical performance.