Yen-Chi Cheng

| CV | Google Scholar |
| Github | LinkedIn |

I am a MS student at CMU RI. Previously, I was lucky to have the opportunities to work with Prof. Shang-Hong Lai as a research intern at Microsoft, Prof. Ming-Hsuan Yang as a visiting scholar at UC Merced, Prof. Min Sun and Prof. Hwann-Tzong Chen as a research assistant at National Tsing Hua University. My research interests include Computer Vision and Deep Learning.

I decided to switch field from Economics/Finance to Computer Science due to my research experiences in this field. I would like to pursue a Ph.D. degree in Computer Science to fulfill my career goal.

Please see my CV (last updated July 20) for more details.


Jan. 21 - June. 22


Research Intern
Mar. 20 - Present


UC Merced
Visiting Scholar
Sept. 19 - Mar. 20


Research Assistant
Sept. 18 - Feb. 20


March 18 - Aug. 18


M.B.A. in Finance
(Leave of Absence)
Sept. 17 - June 18

  • [02/2021] Start my graduate study at CMU RI!
  • [07/2020] One paper accepted at ECCV'20.
  • [03/2020] Start my research internship at Microsoft AI R&D Center, Taiwan working with Prof. Shang-Hong Lai.
  • [09/2019] Start my visiting in VLLab at UC Merced working with Prof. Ming-Hsuan Yang.
  • [07/2019] One paper accepted at ICCV'19. See you at Seoul!
  • [11/2018] One workpaper accepted at NeurIPS'18 (spotlight).
  • [09/2018] Start working as a research assistant at NTHU advised by Prof. Min Sun.

In&Out: Diverse Image Outpainting via GAN Inversion
Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang
arXiv 2021

webpage | abstract | bibtex | arXiv | code (coming soon)

Image outpainting seeks for a semantically consistent extension of the input image beyond its available content. Compared to inpainting -- filling in missing pixels in a way coherent with the neighboring pixels -- outpainting can be achieved in more diverse ways since the problem is less constrained by the surrounding pixels. Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image. In this work, we formulate the problem from the perspective of inverting generative adversarial networks. Our generator renders micro-patches conditioned on their joint latent code as well as their individual positions in the image. To outpaint an image, we seek for multiple latent codes not only recovering available patches but also synthesizing diverse outpainting by patch-based generation. This leads to richer structure and content in the outpainted regions. Furthermore, our formulation allows for outpainting conditioned on the categorical input, thereby enabling flexible user controls. Extensive experimental results demonstrate the proposed method performs favorably against existing in- and outpainting methods, featuring higher visual quality and diversity.

            author = {
                    Cheng, Yen-Chi and 
                    Lin, Chieh Hubert and
                    Le, Hsin-Ying and 
                    Ren, Jian and
                    Tulyakov, Sergey and
                    Yang, Ming-Hsuan
            title = {{In&Out}: Diverse Image Outpainting via GAN Inversion},
            journal={arXiv preprint arXiv:2104.00675},
            year = {2021}

Controllable Image Synthesis via SegVAE
Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, Ming-Hsuan Yang
ECCV 2020

webpage | abstract | bibtex | arXiv | code

Flexible user controls are desirable for content creation and image editing. A semantic map is commonly used intermediate representation for conditional image generation. Compared to the operation on raw RGB pixels, the semantic map enables simpler user modification. In this work, we specifically target at generating semantic maps given a label-set consisting of desired categories. The proposed framework, SegVAE, synthesizes semantic maps in an iterative manner using conditional variational autoencoder. Quantitative and qualitative experiments demonstrate that the proposed model can generate realistic and diverse semantic maps. We also apply an off-the-shelf image-to-image translation model to generate realistic RGB images to better understand the quality of the synthesized semantic maps. Furthermore, we showcase several real-world image-editing applications including object removal, object insertion, and object replacement.

            Author = {
            Cheng, Yen-Chi and 
            Lee, Hsin-Ying and
            Sun, Min and
            Yang, Ming-Hsuan
            Title = {Controllable Image Synthesis via {SegVAE}},
            Booktitle = {ECCV},
            Year = {2020}

Point-to-Point Video Generation
Yen-Chi Cheng*, Tsun-Hsuan Wang*, Chieh Hubert Lin, Hwann-Tzong Chen, Min Sun
ICCV 2019
(* indicates equal contribution)

webpage | abstract | bibtex | arXiv | code

While image synthesis achieves tremendous breakthroughs (e.g., generating realistic faces), video generation is less explored and harder to control, which limits its applications in the real world. For instance, video editing requires temporal coherence across multiple clips and thus poses both start and end constraints within a video sequence. We introduce point-to-point video generation that controls the generation process with two control points: the targeted start- and end-frames. The task is challenging since the model not only generates a smooth transition of frames but also plans ahead to ensure that the generated end-frame conforms to the targeted end-frame for videos of various lengths. We propose to maximize the modified variational lower bound of conditional data likelihood under a skip-frame training strategy. Our model can generate end-frame-consistent sequences without loss of quality and diversity. We evaluate our method through extensive experiments on Stochastic Moving MNIST, Weizmann Action, Human3.6M, and BAIR Robot Pushing under a series of scenarios. The qualitative results showcase the effectiveness and merits of point-to-point generation.

              Author = {Wang, Tsun-Hsuan and 
                      Cheng, Yen-Chi and 
                      Lin, Chieh Hubert and 
                      Chen, Hwann-Tzong and 
                      Sun, Min},
              Title = {Point-to-Point Video Generation},
              Booktitle = {ICCV},
              Year = {2019}

Radiotherapy Target Contouring with Convolutional Gated Graph Neural Network
Chun-Hung Chao, Yen-Chi Cheng, Hsien-Tzu Cheng, Chi-Wen Huang, Tsung-Ying Ho, Chen-Kan Tseng, Le Lu, Min Sun
NeurIPS 2018 Workshop (Spotlight)

abstract | bibtex | arXiv

Tomography medical imaging is essential in the clinical workflow of modern cancer radiotherapy. Radiation oncologists identify cancerous tissues, applying delineation on treatment regions throughout all image slices. This kind of task is often formulated as a volumetric segmentation task by means of 3D convolutional networks with considerable computational cost. Instead, inspired by the treating methodology of considering meaningful information across slices, we used Gated Graph Neural Network to frame this problem more efficiently. More specifically, we propose convolutional recurrent Gated Graph Propagator (GGP) to propagate high-level information through image slices, with learnable adjacency weighted matrix. Furthermore, as physicians often investigate a few specific slices to refine their decision, we model this slice-wise interaction procedure to further improve our segmentation result. This can be set by editing any slice effortlessly as updating predictions of other slices using GGP. To evaluate our method, we collect an Esophageal Cancer Radiotherapy Target Treatment Contouring dataset of 81 patients which includes tomography images with radiotherapy target. On this dataset, our convolutional graph network produces state-of-the-art results and outperforms the baselines. With the addition of interactive setting, performance is improved even further. Our method has the potential to be easily applied to diverse kinds of medical tasks with volumetric images. Incorporating both the ability to make a feasible prediction and to consider the human interactive input, the proposed method is suitable for clinical scenarios.

                title     = {Radiotherapy Target Contouring with Convolutional Gated Graph Neural
                author    = {Chao, Chun-Hung and Cheng, Yen-Chi and Cheng, Hsien-Tzu and Huang, Chi-Wen and
                             Ho, Tsung-Ying and Tseng, Chen-Kan
                            Lu, Le and Sun, Min},
                journal   = {arXiv preprint arXiv:1904.02912},
                year      = {2019},

PyTorch VideoVAE
A PyTorch implementation of a video generation method with attribute control using VAE by He et al., ECCV 2018.
| code |


PyTorch SegInpaint
A PyTorch implementation of an inpainting method based on Song et al., BMVC 2018.
| code |

  Professional Activity
  • Reviewer: CVPR 2021
  • ICCV 2019 Travel Award

Template: this and this