Professional Experience

September 2014 – Present
Madison, WI

Research Assistant

Department of Statistics, Department of Biostatistics & Medical Informatics, University of Wisconsin - Madison

Dissertation Research:

  • Developed biologically motivated hierarchical generative model to investigate 3D chromatin architectures using Hi-C data and investigated the genomic features involving repetitive regions of the genomes.
  • Developed a computational tool for fast simulation of 3D proximity ligation sequencing data.
  • Constructed hierarchical testing to detect differential 3D genome interactions with precise False Discovery Rate control.
  • Investigated protein-DNA interactions residing in repetitive regions and integrated multi-mapping reads into Encyclopedia of DNA Elements (ENCODE) ChIP-seq data processing pipeline.

Collaborative Work with the Bresnick Lab:

  • Leveraged multi-omics analysis, particularly using ATAC-seq and RNA-seq data, to reveals GATA/Heme regulation mechanism in controlling hemoglobin synthesis and erythrocyte development.
  • Investigated the impact of single nucleotide mutation in the Ets motif of GATA2 enhancer on its function to control hematopoiesis through a comprehensive transcriptomic differential analysis.
February 2014 – June 2014
Beijing, China

Data Scientist Intern

Business Analysis, IBM


  • Developed dynamic text mining model using IBM communication database to infer topic networks.
  • Constructed a modified Latent Dirichlet Allocation model to optimize the CPU usage of IBM servers.
June 2013 – September 2013
Ottawa, Canada

Research Assistant

Department of Biology, Mathematics and Statistics, University of Ottawa

Investigated Approximate Bayesian Computation (ABC), ABC–Markov Chain Monte Carlo and ABC– Sequential Monte Carlo samplers in estimating the transmission networks of viruses in human populations.
September 2012 – January 2013
Hong Kong, China

Exchange Study

Department of Statistics and Actuarial Science, the University of Hong Kong

February 2012 – January 2014
Beijing, China

Project Assistant

School of Information, Renmin University of China

Developed multi-objectives operations research model to improve proposals grouping accuracy and efficiency utilizing Multi-Objective Particle Swarm Optimization (MOPSO) algorithm for optimization.


Current Hi-C analysis approaches are unable to account for reads that align to multiple locations, and hence underestimate biological …

In genomics, statisticians and computational groups rely on data-driven simulations to benchmark analysis approaches and develop best …

The development and function of stem and progenitor cells that produce blood cells are vital in physiology. GATA2 mutations cause …

By functioning as an enzyme cofactor, hemoglobin component, and gene regulator, heme is vital for life. One mode of heme-regulated …

Hi-C sequencing technology provides key insights into the 3D structures of the human genome. Although peak detection from Hi-C …

Segmental duplications and other highly repetitive regions of genomes contribute significantly to cells’ regulatory programs. …

Studies on Approximate Bayesian Computation (ABC) replacing the intractable likelihood function in evaluation of the posterior …


Research Interests


3D Chromatin Organization

Utilization of Hi-C multi-mapping reads; Fast simulation of Hi-C interactions; Hi-C differential interactions detection.

Repetitive Regions of Genomes

Studies of repetitive regions of genomes from 2D and 3D perspectives

GATA1/Heme-dependent chromatin targeting

Multi-omics studies of GATA1/Heme regulation mechanism.


  • mHiC: Python pipeline of multi-mapping strategy for Hi-C data by probabilistically assigning reads originatedfrom repetitive regions. Major computing parts are accelerated by C.

  • FreeHiC: Python pipeline using FRagment Interactions Empirical Estimation method for fast simulation of Hi-C and other 3D proximity ligation sequencing data. Major computing parts are accelerated by C.

  • TreeHiC: R package for constructing hierarchical tree-structured multiple testing procedure for detecting differential chromatin interactions across different conditions. (Co-developer)

  • permseq: R package for mapping protein-DNA interactions in highly repetitive regions of the genomes with prior-enhanced read mapping.

  • permseqExample: R package for the permseq package illustration and demo runs. Smaller raw data and demo R scripts are provided for quick runs in order to get to know permseq package.


  1. Zheng F., Wei D., Zheng Y. Protected from 15th Nov. 2008 to 15th Nov. 2018. Computer Power Cord. Patent No. ZL 2007 2 0157387.7
  2. Zheng Y., Yao S.. Protected from 10th Jan. 2007 to 10th Jan. 2017. Easel. Patent No. ZL 2005 20125184.0

Honors and Awards

  • [July 2018] ASA Section of Statistics in Genomics and Genetics (SGG) Distinguished Student Paper Award. Joint Statistical Meeting, Vancouver, Canada
  • [Nov. 2017] Stellar Abstract Award. 2017 Program in Quantitative Genomics Conference, Harvard University, Boston, MA
  • [June 2017] Student Travel Award. 14th Graybill Conference on Statistical Genomics and Genetics, Colorado State University – Fort Collins, Fort Collins, CO
  • [May 2017] GLBIO Sponsorship Complimentary Registration. Great Lakes Bioinformatics Conference (GLBIO) 2017, University of Illinois at Chicago, Chicago, IL
  • [July 2017] Registration and Travel Scholarships. Summer Institute in Statistics for Big Data (SISBID), University of Washington, Seattle, WA
  • [June 2016] Best Lightning Talk - Honorable Mention. 2016 Encyclopedia of DNA Elements (ENCODE) Consortium Meeting, La Jolla, CA


  • [2017-2018] UW-Madison SRGC Conference Presentation Funds, $1200. University of Wisconsin – Madison, Madison, WI
  • [2011-2014] Scholarships for Outstanding Academic Performance, RMB7000. Renmin University of China, Beijing, China
  • [2013] Mitacs (Canada) and China Scholarship Council Research Scholarship, $4500. Mitacs, Canada
  • [2012] Fung Scholar Scholarship, HK$5000. Hong Kong, China

Recent Blogs

More Posts

Three-dimensional genome architecture and disease variants.

Cython - Linking C/C++ with Python and Accelerate Computation Considerably.

ggplot2 Summary and Color Recommendation for Clean and Pretty Visualization.

Download data from GEO by linux command lines.

Practical Shell Commands to Manipulate Genome Data in a Fast and Clean fashion.

Computing Skills








Accelerting Pipielines



Grid/Distributed computing systems