FreeHiC: Data-driven simulations for Hi-C experiment.

Abstract

In genomics, statisticians and computational groups rely on data-driven simulations to benchmark analysis approaches and develop best practices. In the Hi-C context, this generated an urgent need for a computationally efficient tool to simulate Hi-C contact matrices. FreeHiC is designed to be completely data-driven in the sense that it empirically learns the key parameters of the Hi-C data such as fragment interaction patterns and simulates data closely following the Hi-C experimental protocol. Specifically, it automatically trains for the mutation, stands orientation, and base quality score settings. As a result, it generates a Hi-C interaction matrix of any given sequencing depth with the user-controlled level of noise that gets incorporated into a few key parameters. This entire non-parametric simulation scheme with few assumptions makes FreeHiC extendable to all types of 3D interaction datasets. The key application of this simulation setting includes generating biological/technical replicate data for benchmarking Hi-C analysis methods in terms of stability, reproducibility, false positive rate and power in differential Hi-C analysis. FreeHiC is wrapped into a python package which only requires a single command line making it user-friendly for researchers with a wide range of backgrounds.

Date
Avatar
Ye Zheng
Ph.D. Candidate in Statistics