Search | Virtual Health Library

The C-SHIFT Algorithm for Normalizing Covariances.

Chunikhina, Evgenia; Logan, Paul; Kovchegov, Yevgeniy; Yambartsev, Anatoly; Mondal, Debashis; Morgun, Andrey.

IEEE/ACM Trans Comput Biol Bioinform ; 20(1): 720-730, 2023.

Article in English | MEDLINE | ID: mdl-35167480

ABSTRACT

Omics technologies are powerful tools for analyzing patterns in gene expression data for thousands of genes. Due to a number of systematic variations in experiments, the raw gene expression data is often obfuscated by undesirable technical noises. Various normalization techniques were designed in an attempt to remove these non-biological errors prior to any statistical analysis. One of the reasons for normalizing data is the need for recovering the covariance matrix used in gene network analysis. In this paper, we introduce a novel normalization technique, called the covariance shift (C-SHIFT) method. This normalization algorithm uses optimization techniques together with the blessing of dimensionality philosophy and energy minimization hypothesis for covariance matrix recovery under additive noise (in biology, known as the bias). Thus, it is perfectly suited for the analysis of logarithmic gene expression data. Numerical experiments on synthetic data demonstrate the method's advantage over the classical normalization techniques. Namely, the comparison is made with Rank, Quantile, cyclic LOESS (locally estimated scatterplot smoothing), and MAD (median absolute deviation) normalization methods. We also evaluate the performance of C-SHIFT algorithm on real biological data.

Subject(s)

Algorithms , Gene Expression Profiling , Gene Expression Profiling/methods

Construct and Compare Gene Coexpression Networks with DAPfinder and DAPview.

Skinner, Jeff; Kotliarov, Yuri; Varma, Sudhir; Mine, Karina L; Yambartsev, Anatoly; Simon, Richard; Huyen, Yentram; Morgun, Andrey.

BMC Bioinformatics ; 12: 286, 2011 Jul 14.

Article in English | MEDLINE | ID: mdl-21756334

ABSTRACT

BACKGROUND: DAPfinder and DAPview are novel BRB-ArrayTools plug-ins to construct gene coexpression networks and identify significant differences in pairwise gene-gene coexpression between two phenotypes. RESULTS: Each significant difference in gene-gene association represents a Differentially Associated Pair (DAP). Our tools include several choices of filtering methods, gene-gene association metrics, statistical testing methods and multiple comparison adjustments. Network results are easily displayed in Cytoscape. Analyses of glioma experiments and microarray simulations demonstrate the utility of these tools. CONCLUSIONS: DAPfinder is a new friendly-user tool for reconstruction and comparison of biological networks.

Subject(s)

Gene Expression Profiling/methods , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Glioma/genetics , Software , Humans , Oligonucleotide Array Sequence Analysis

New approach reveals CD28 and IFNG gene interaction in the susceptibility to cervical cancer.

Guzman, Valeska B; Yambartsev, Anatoly; Goncalves-Primo, Amador; Silva, Ismael D C G; Carvalho, Carmen R N; Ribalta, Julisa C L; Goulart, Luiz Ricardo; Shulzhenko, Natalia; Gerbase-Delima, Maria; Morgun, Andrey.

Hum Mol Genet ; 17(12): 1838-44, 2008 Jun 15.

Article in English | MEDLINE | ID: mdl-18337305

ABSTRACT

Cervical cancer is a complex disease with multiple environmental and genetic determinants. In this study, we sought an association between polymorphisms in immune response genes and cervical cancer using both single-locus and multi-locus analysis approaches. A total of 14 single nucleotide polymorphisms (SNPs) distributed in CD28, CTLA4, ICOS, PDCD1, FAS, TNFA, IL6, IFNG, TGFB1 and IL10 genes were determined in patients and healthy individuals from three independent case/control sets. The first two sets comprised White individuals (one group with 82 cases and 85 controls, the other with 83 cases and 85 controls) and the third was constituted by non-white individuals (64 cases and 75 controls). The multi-locus analysis revealed higher frequencies in cancer patients of three three-genotype combinations [CD28+17(TT)/IFNG+874(AA)/TNFA-308(GG), CD28+17(TT)/IFN+847(AA)/PDCD1+7785(CT), and CD28 +17(TT)/IFNG+874(AA)/ICOS+1564(TT)] (P < 0.01, Monte Carlo simulation). We hypothesized that this two-genotype [CD28(TT) and IFNG(AA)] combination could have a major contribution to the observed association. To address this question, we analyzed the frequency of the CD28(TT), IFNG(AA) genotype combination in the three groups combined, and observed its increase in patients (P = 0.0011 by Fisher's exact test). The contribution of a third polymorphism did not reach statistical significance (P = 0.1). Further analysis suggested that gene-gene interaction between CD28 and IFNG might contribute to susceptibility to cervical cancer. Our results showed an epistatic effect between CD28 and IFNG genes in susceptibility to cervical cancer, a finding that might be relevant for a better understanding of the disease pathogenesis. In addition, the novel analytical approach herein proposed might be useful for increasing the statistical power of future genome-wide multi-locus studies.

Subject(s)

CD28 Antigens/genetics , Carcinoma, Squamous Cell/genetics , Epistasis, Genetic , Genetic Predisposition to Disease , Interferon-gamma/genetics , Uterine Cervical Neoplasms/genetics , Brazil , Case-Control Studies , Female , Humans

Noninvasive prenatal paternity determination using microhaplotypes: a pilot study.

Wang, Jaqueline Yu Ting; Whittle, Martin R; Puga, Renato David; Yambartsev, Anatoly; Fujita, André; Nakaya, Helder I.

BMC Med Genomics ; 13(1): 157, 2020 10 23.

Article in English | MEDLINE | ID: mdl-33097049

ABSTRACT

BACKGROUND: The use of noninvasive techniques to determine paternity prenatally is increasing because it reduces the risks associated with invasive procedures. Current methods, based on SNPs, use the analysis of at least 148 markers, on average. METHODS: To reduce the number of regions, we used microhaplotypes, which are chromosomal segments smaller than 200 bp containing two or more SNPs. Our method employs massively parallel sequencing and analysis of microhaplotypes as genetic markers. We tested 20 microhaplotypes and ascertained that 19 obey Hardy-Weinberg equilibrium and are independent, and data from the 1000 Genomes Project were used for population frequency and simulations. RESULTS: We performed simulations of true and false paternity, using the 1000 Genomes Project data, to confirm if the microhaplotypes could be used as genetic markers. We observed that at least 13 microhaplotypes should be used to decrease the chances of false positives. Then, we applied the method in 31 trios, and it was able to correctly assign the fatherhood in cases where the alleged father was the real father, excluding the inconclusive results. We also cross evaluated the mother-plasma duos with the alleged fathers for false inclusions within our data, and we observed that the use of at least 15 microhaplotypes in real data also decreases the false inclusions. CONCLUSIONS: In this work, we demonstrated that microhaplotypes can be used to determine prenatal paternity by using only 15 regions and with admixtures of DNA.

Subject(s)

DNA/analysis , Genetic Markers , Haplotypes , Noninvasive Prenatal Testing/methods , Paternity , Polymorphism, Single Nucleotide , DNA/genetics , Female , Genetic Testing , High-Throughput Nucleotide Sequencing , Humans , Male , Pilot Projects , Pregnancy

Differentially correlated genes in co-expression networks control phenotype transitions.

Thomas, Lina D; Vyshenska, Dariia; Shulzhenko, Natalia; Yambartsev, Anatoly; Morgun, Andrey.

F1000Res ; 5: 2740, 2016.

Article in English | MEDLINE | ID: mdl-28163897

ABSTRACT

BACKGROUND: Co-expression networks are a tool widely used for analysis of "Big Data" in biology that can range from transcriptomes to proteomes, metabolomes and more recently even microbiomes. Several methods were proposed to answer biological questions interrogating these networks. Differential co-expression analysis is a recent approach that measures how gene interactions change when a biological system transitions from one state to another. Although the importance of differentially co-expressed genes to identify dysregulated pathways has been noted, their role in gene regulation is not well studied. Herein we investigated differentially co-expressed genes in a relatively simple mono-causal process (B lymphocyte deficiency) and in a complex multi-causal system (cervical cancer). METHODS: Co-expression networks of B cell deficiency (Control and BcKO) were reconstructed using Pearson correlation coefficient for two mus musculus datasets: B10.A strain (12 normal, 12 BcKO) and BALB/c strain (10 normal, 10 BcKO). Co-expression networks of cervical cancer (normal and cancer) were reconstructed using local partial correlation method for five datasets (total of 64 normal, 148 cancer). Differentially correlated pairs were identified along with the location of their genes in BcKO and in cancer networks. Minimum Shortest Path and Bi-partite Betweenness Centrality where statistically evaluated for differentially co-expressed genes in corresponding networks. Results: We show that in B cell deficiency the differentially co-expressed genes are highly enriched with immunoglobulin genes (causal genes). In cancer we found that differentially co-expressed genes act as "bottlenecks" rather than causal drivers with most flows that come from the key driver genes to the peripheral genes passing through differentially co-expressed genes. Using in vitro knockdown experiments for two out of 14 differentially co-expressed genes found in cervical cancer (FGFR2 and CACYBP), we showed that they play regulatory roles in cancer cell growth. CONCLUSION: Identifying differentially co-expressed genes in co-expression networks is an important tool in detecting regulatory genes involved in alterations of phenotype.

Unexpected links reflect the noise in networks.

Yambartsev, Anatoly; Perlin, Michael A; Kovchegov, Yevgeniy; Shulzhenko, Natalia; Mine, Karina L; Dong, Xiaoxi; Morgun, Andrey.

Biol Direct ; 11(1): 52, 2016 10 13.

Article in English | MEDLINE | ID: mdl-27737689

ABSTRACT

BACKGROUND: Gene covariation networks are commonly used to study biological processes. The inference of gene covariation networks from observational data can be challenging, especially considering the large number of players involved and the small number of biological replicates available for analysis. RESULTS: We propose a new statistical method for estimating the number of erroneous edges in reconstructed networks that strongly enhances commonly used inference approaches. This method is based on a special relationship between sign of correlation (positive/negative) and directionality (up/down) of gene regulation, and allows for the identification and removal of approximately half of all erroneous edges. Using the mathematical model of Bayesian networks and positive correlation inequalities we establish a mathematical foundation for our method. Analyzing existing biological datasets, we find a strong correlation between the results of our method and false discovery rate (FDR). Furthermore, simulation analysis demonstrates that our method provides a more accurate estimate of network error than FDR. CONCLUSIONS: Thus, our study provides a new robust approach for improving reconstruction of covariation networks. REVIEWERS: This article was reviewed by Eugene Koonin, Sergei Maslov, Daniel Yasumasa Takahashi.

Subject(s)

Computational Biology/methods , Gene Expression Regulation , Gene Regulatory Networks , Bayes Theorem

Reverse enGENEering of Regulatory Networks from Big Data: A Roadmap for Biologists.

Dong, Xiaoxi; Yambartsev, Anatoly; Ramsey, Stephen A; Thomas, Lina D; Shulzhenko, Natalia; Morgun, Andrey.

Bioinform Biol Insights ; 9: 61-74, 2015.

Article in English | MEDLINE | ID: mdl-25983554

ABSTRACT

Omics technologies enable unbiased investigation of biological systems through massively parallel sequence acquisition or molecular measurements, bringing the life sciences into the era of Big Data. A central challenge posed by such omics datasets is how to transform these data into biological knowledge, for example, how to use these data to answer questions such as: Which functional pathways are involved in cell differentiation? Which genes should we target to stop cancer? Network analysis is a powerful and general approach to solve this problem consisting of two fundamental stages, network reconstruction, and network interrogation. Here we provide an overview of network analysis including a step-by-step guide on how to perform and use this approach to investigate a biological question. In this guide, we also include the software packages that we and others employ for each of the steps of a network analysis workflow.

Gene network reconstruction reveals cell cycle and antiviral genes as major drivers of cervical cancer.

Mine, Karina L; Shulzhenko, Natalia; Yambartsev, Anatoly; Rochman, Mark; Sanson, Gerdine F O; Lando, Malin; Varma, Sudhir; Skinner, Jeff; Volfovsky, Natalia; Deng, Tao; Brenna, Sylvia M F; Carvalho, Carmen R N; Ribalta, Julisa C L; Bustin, Michael; Matzinger, Polly; Silva, Ismael D C G; Lyng, Heidi; Gerbase-DeLima, Maria; Morgun, Andrey.

Nat Commun ; 4: 1806, 2013.

Article in English | MEDLINE | ID: mdl-23651994

ABSTRACT

Although human papillomavirus was identified as an aetiological factor in cervical cancer, the key human gene drivers of this disease remain unknown. Here we apply an unbiased approach integrating gene expression and chromosomal aberration data. In an independent group of patients, we reconstruct and validate a gene regulatory meta-network, and identify cell cycle and antiviral genes that constitute two major subnetworks upregulated in tumour samples. These genes are located within the same regions as chromosomal amplifications, most frequently on 3q. We propose a model in which selected chromosomal gains drive activation of antiviral genes contributing to episomal virus elimination, which synergizes with cell cycle dysregulation. These findings may help to explain the paradox of episomal human papillomavirus decline in women with invasive cancer who were previously unable to clear the virus.

Subject(s)

Antiviral Agents/metabolism , Cell Cycle/genetics , Gene Regulatory Networks/genetics , Genes, Neoplasm/genetics , Papillomaviridae/genetics , Uterine Cervical Neoplasms/genetics , Uterine Cervical Neoplasms/virology , Chromosome Aberrations , Chromosomes, Human/genetics , Databases, Genetic , Female , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Genome, Human/genetics , Genomic Instability , Humans , Lysosomal Membrane Proteins/metabolism , Meta-Analysis as Topic , Neoplasm Proteins/metabolism , Papillomavirus Infections/genetics , Papillomavirus Infections/virology , Reproducibility of Results , Uterine Cervical Neoplasms/pathology , Virus Integration/genetics

Selection of control genes for quantitative RT-PCR based on microarray data.

Shulzhenko, Natalia; Yambartsev, Anatoly; Goncalves-Primo, Amador; Gerbase-DeLima, Maria; Morgun, Andrey.

Biochem Biophys Res Commun ; 337(1): 306-12, 2005 Nov 11.

Article in English | MEDLINE | ID: mdl-16182241

ABSTRACT

Use of internal reference gene(s) is necessary for adequate quantification of target gene expression by RT-PCR. Herein, we elaborated a strategy of control gene selection based on microarray data and illustrated it by analyzing endomyocardial biopsies with acute cardiac rejection and infection. Using order statistics and binomial distribution we evaluated the probability of finding low-varying genes by chance. For analysis, the microarray data were divided into two sample subsets. Among the first 10% of genes with the lowest standard deviations, we found 14 genes common to both subsets. After normalization using two selected genes, high correlation was observed between expression of target genes evaluated by microarray and RT-PCR, and in independent dataset by RT-PCR (r = 0.9, p < 0.001). In conclusion, we showed a simple and reliable strategy of selection and validation of control genes for RT-PCR from microarray data that can be easily applied for different experimental designs and tissues.

Subject(s)

Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Reverse Transcriptase Polymerase Chain Reaction/standards , Algorithms , Chagas Disease/genetics , Chagas Disease/metabolism , Data Interpretation, Statistical , Graft Rejection/genetics , Graft Rejection/metabolism , Heart Transplantation , Humans , Myocardium/metabolism , Reference Standards

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL