“The purpose of computing is insight, not numbers.” — Richard Hamming


Nowadays, high-throughput sequencing technology helps us obtain and analyze whole genome data faster and cheaper than ever before. One day, people can easily get their DNA sequences. Would I do anything differently if I knew my own genome sequence? What could I learn from my own genome sequence? Currently, the research areas of molecular regulation, statistical methods, and computational tools all have considerable room for development, and I am eager to explore these in-depth.

I am currently working with Prof. Joshua Dubnau and Prof. Molly Hammell at Cold Spring Harbor Laboratory. My research focuses on the development of computational strategies to interpret genomic and epigenomic data produced from next-generation sequencing. Aside from my interest in research, I am a Python enthusiast. When I was in Taiwan, I regularly attended events of programming communities including MLDM Monday and Taipei.py . I also like to share and discuss my thoughts on programming with people. I have given three talks at PyCon Taiwan and PyCon APAC.

Selected Research Projects

HETGEN: analytical tool to assess the heterogeneity of DNA methylation patterns

DNA methylation plays critical roles in transcriptional regulation, development, and imprinting. High-throughput DNA sequencing technology coupling with bisulfite treatment is able to profile genome-wide DNA methylation at single nucleotide resolution. The analysis of genome-wide methylation data starts with alignment of bisulfite-converted reads. After alignment, methods are employed to identify differentially methylated regions (DMRs) among samples. Identification of DMRs considers only the change in methylation levels, but the heterogeneity of methylation patterns in a cell population is often ignored. In cancer research, such information may suggest subsets of cells are progressing differently at certain regions. The more heterogeneous one region is, the more likely it is involved in such a process. I developed a new approach based on string kernels to assess the heterogeneity of DNA methylation patterns from alignment. This approach provides insight into epigenetic regulation by identifying potential epigenetic regulatory regions.

FlyLine: Drosophila tracking system

The measurement of Drosophila walking behavior can be used to investigate spatial orientation memory. A single fly with clipped wings is placed in the middle of an arena. The fly will walk back and forth between two inaccessible black stripes. The fly movement is recorded by a video camera. However, there was no suitable software to track the fly position and analyze its trajectory. I set up the fly tracking system that analyzed the videos of the walking experiments. The first step was to apply a Gaussian blur to each video frame in order to reduce image noise. The image was then transformed to black-and-white using a given threshold. Since it was difficult to maintain the stability of the experimental environment, each video would usually need a different threshold. I calculated the distribution of detected contours along with every threshold, and auto-determined a best one. After performing these steps, the fly body is easy to distinguish from the homogenously bright background. I originally wrote the program in C using the OpenCV library but then rewrote the entire program in Python in order to accelerate the development process.

Publications

Journal Papers

  1. Sofia Gkountela, Kelvin Zhang, Tiasha Shafiq, Wen-Wei Liao, Joseph Hargan-Calvopiña, Pao-Yang Chen, Amander T Clark. DNA demethylation dynamics in the human prenatal germline. Cell, 2015.
  2. Taku Sasaki, Tatsuo Kanno, Shih-Chieh Liang, Pao-Yang Chen, Wen-Wei Liao, Wen-Dar Lin, Antonius J.M. Matzke and Marjori Matzke. An Rtf2 domain-containing protein influences pre-mRNA splicing and is essential for embryonic development in Arabidopsis thaliana. Genetics, 2015.
  3. Pao-Yang Chen, Barbara Montanini, Wen-Wei Liao, Marco Morselli, Artur Jaroszewicz, David Lopez, Simone Ottonello and Matteo Pellegrini. A comprehensive resource of genomic, epigenomic and transcriptomic sequencing data for the black truffle Tuber melanosporum. GigaScience, 2014. PDF
  4. Taku Sasaki, Tzuu-fen Lee, Wen-Wei Liao, Ulf Naumann, Jo-Ling Liao, Changho Eun, Ya-Yi Huang, Jason L. Fu, Pao-Yang Chen, Blake C. Meyers, Antonius J.M. Matzke and Marjori Matzke. Distinct and concurrent pathways of Pol II and Pol IV-dependent siRNA biogenesis at a repetitive trans-silencer locus in Arabidopsis thaliana. The Plant Journal, 2014. PDF

Presentation

Oral Presentation

  1. Speaker, PyCon APAC. May 17, 2014.
    A versatile Python tool to assess DNA methylation variation and identify DMRs SlideShare
    Presented in the largest Python conference in Asia (attendance: 800)
  2. Speaker, PyCon Taiwan. May 26, 2013.
    An epigenetics odyssey SlideShare
    Introducing how Python helps us to do research in epigenetics
  3. Speaker, PyCon Taiwan. June 9, 2012.
    Use the Matplotlib, Luke SlideShare
    Introducing the 2D plotting package, Matplotlib, for scientific visualization

Poster Presentation

  1. Presenter, Genome Informatics Workshop/International Society for Computational Biology-Asia. Dec 16, 2014.
    HETGEN: a bioinformatic tool to assess genome-wide heterogeneity of DNA methylation PDF
  2. Presenter, International Symposium on Evolutionary Genomics and Bioinformatics. Nov 7, 2014.
    HETGEN: a bioinformatic tool to assess genome-wide heterogeneity of DNA methylation PDF
  3. Presenter, Annual Conference of Asia Epigenome Alliance. Nov 8, 2013.
    Improving rice regeneration efficiency through extensive comparisons of callus methylome and transcriptome PDF