Your browser doesn't support javascript.
loading
MultiBaC: A strategy to remove batch effects between different omic data types.
Ugidos, Manuel; Tarazona, Sonia; Prats-Montalbán, José M; Ferrer, Alberto; Conesa, Ana.
Affiliation
  • Ugidos M; Gene expression and RNA Metabolism Laboratory, Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas (CSIC), Valencia, Spain.
  • Tarazona S; Multivariate Statistical Engineering Group, Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain.
  • Prats-Montalbán JM; Multivariate Statistical Engineering Group, Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain.
  • Ferrer A; Multivariate Statistical Engineering Group, Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain.
  • Conesa A; Microbiology and Cell Science Department, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, USA.
Stat Methods Med Res ; 29(10): 2851-2864, 2020 10.
Article in En | MEDLINE | ID: mdl-32131696
ABSTRACT
Diversity of omic technologies has expanded in the last years together with the number of omic data integration strategies. However, multiomic data generation is costly, and many research groups cannot afford research projects where many different omic techniques are generated, at least at the same time. As most researchers share their data in public repositories, different omic datasets of the same biological system obtained at different labs can be combined to construct a multiomic study. However, data obtained at different labs or moments in time are typically subjected to batch effects that need to be removed for successful data integration. While there are methods to correct batch effects on the same data types obtained in different studies, they cannot be applied to correct lab or batch effects across omics. This impairs multiomic meta-analysis. Fortunately, in many cases, at least one omics platform-i.e. gene expression- is repeatedly measured across labs, together with the additional omic modalities that are specific to each study. This creates an opportunity for batch analysis. We have developed MultiBaC (multiomic Multiomics Batch-effect Correction correction), a strategy to correct batch effects from multiomic datasets distributed across different labs or data acquisition events. Our strategy is based on the existence of at least one shared data type which allows data prediction across omics. We validate this approach both on simulated data and on a case where the multiomic design is fully shared by two labs, hence batch effect correction within the same omic modality using traditional methods can be compared with the MultiBaC correction across data types. Finally, we apply MultiBaC to a true multiomic data integration problem to show that we are able to improve the detection of meaningful biological effects.
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Stat Methods Med Res Year: 2020 Document type: Article Affiliation country: España

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Stat Methods Med Res Year: 2020 Document type: Article Affiliation country: España