Yen-Chi Cheng
email: yenchich_at_cs.cmu.edu

| CV | Google Scholar | Github | LinkedIn |

I am a MS student at Carnegie Mellon University in the Robotics Institute, advised by Professor Shubham Tulsiani My research interests inclue computer vision and machine learning. My goal is to build systems that can perceive and understand the world, and be able to recreate them.

Previously, I am fortunate to work with Hsin-Ying Lee and Sergey Tulyakov at Snap Research, Professor Shang-Hong Lai at Microsoft, Professor Ming-Hsuan Yang at UC Merced, Professors Min Sun and Hwann-Tzong Chen at National Tsing Hua University.

I am seeking for a Ph.D. position in Computer Science for 2022 Fall. Please see my CV (last updated Nov. 26) for more details.

sym

Snap Inc.
Research Intern
May. 21 - Aug. 21

sym

CMU
MSCV
Jan. 21 - June. 22

sym

Microsoft
Research Intern
Mar. 20 - Present

sym

UC Merced
Visiting Scholar
Sept. 19 - Mar. 20

sym

NTHU
Research Assistant
Sept. 18 - Feb. 20

sym

NTU
M.B.A. in Finance
(Leave of Absence)
Sept. 17 - June 18

  News
  Publications
sym

Non-sequential Autoregressive Shape Priorsfor 3D Completion, Reconstruction and Generation
Paritosh Mittal*, Yen-Chi Cheng*, Maneesh Singh, Shubham Tulsiani
2021
(* indicates equal contribution)

webpage (coming soon) | abstract | bibtex | arXiv (coming soon) | code (coming soon)

Powerful priors allow us to perform inference with insufficient information. In this paper, we propose an autoregressive prior for 3D shapes to solve multimodal 3D tasks such as shape completion, reconstruction, and generation. We model the distribution over 3D shapes as a non-sequential autoregressive distribution over a discretized, low-dimensional, symbolic grid-like latent representation of 3D shapes. We demonstrate that the proposed prior is able to represent distributions over 3D shape spaces conditioned over information from an arbitrary set of spatially anchored query locations. This enables us to represent distributions over 3D shapes conditioned on information from an arbitrary set of spatially anchored query locations and thus perform shape completion in such arbitrary settings (\eg generating a complete chair given only a view of the back leg). We also show that the learned autoregressive prior can be leveraged for conditional tasks such as single-view reconstruction and language-based generation. This is achieved by learning task-specific `naive' conditionals which can be approximated by light-weight models trained on minimal paired data. We validate the effectiveness of the proposed method using both quantitative and qualitative evaluation and show that the proposed method outperforms the specialized state-of-the-art methods trained for individual tasks.

        @article{gen3d2021,
            author = {
                    Mittal, Paritosh and
                    Cheng, Yen-Chi and 
                    Singh, Maneesh and
                    Tulsiani, Shubham
                  },
            title = {Non-sequential Autoregressive Shape Priorsfor 3D Completion, Reconstruction and Generation},
            journal={arXiv preprint arXiv:xxxx.xxxx},
            year = {2021}
            }
        
sym

In&Out: Diverse Image Outpainting via GAN Inversion
Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang
arXiv 2021

webpage | abstract | bibtex | arXiv | code (coming soon)

Image outpainting seeks for a semantically consistent extension of the input image beyond its available content. Compared to inpainting -- filling in missing pixels in a way coherent with the neighboring pixels -- outpainting can be achieved in more diverse ways since the problem is less constrained by the surrounding pixels. Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image. In this work, we formulate the problem from the perspective of inverting generative adversarial networks. Our generator renders micro-patches conditioned on their joint latent code as well as their individual positions in the image. To outpaint an image, we seek for multiple latent codes not only recovering available patches but also synthesizing diverse outpainting by patch-based generation. This leads to richer structure and content in the outpainted regions. Furthermore, our formulation allows for outpainting conditioned on the categorical input, thereby enabling flexible user controls. Extensive experimental results demonstrate the proposed method performs favorably against existing in- and outpainting methods, featuring higher visual quality and diversity.

        @article{cheng2021inout,
            author = {
                    Cheng, Yen-Chi and 
                    Lin, Chieh Hubert and
                    Lee, Hsin-Ying and 
                    Ren, Jian and
                    Tulyakov, Sergey and
                    Yang, Ming-Hsuan
                  },
            title = {{In&Out}: Diverse Image Outpainting via GAN Inversion},
            journal={arXiv preprint arXiv:2104.00675},
            year = {2021}
            }
        
sym

InfinityGAN: Towards Infinite-Resolution Image Synthesis
Chieh Hubert Lin, Hsin-Ying Lee, Yen-Chi Cheng, Sergey Tulyakov, Ming-Hsuan Yang
arXiv 2021

webpage | abstract | bibtex | arXiv | code (coming soon)

Image outpainting seeks for a semantically consistent extension of the input image beyond its available content. Compared to inpainting -- filling in missing pixels in a way coherent with the neighboring pixels -- outpainting can be achieved in more diverse ways since the problem is less constrained by the surrounding pixels. Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image. In this work, we formulate the problem from the perspective of inverting generative adversarial networks. Our generator renders micro-patches conditioned on their joint latent code as well as their individual positions in the image. To outpaint an image, we seek for multiple latent codes not only recovering available patches but also synthesizing diverse outpainting by patch-based generation. This leads to richer structure and content in the outpainted regions. Furthermore, our formulation allows for outpainting conditioned on the categorical input, thereby enabling flexible user controls. Extensive experimental results demonstrate the proposed method performs favorably against existing in- and outpainting methods, featuring higher visual quality and diversity.

        @article{lin2021infinity,
            author = {
                    Lin, Chieh Hubert and
                    Le, Hsin-Ying and 
                    Cheng, Yen-Chi and 
                    Tulyakov, Sergey and
                    Yang, Ming-Hsuan
                  },
            title = {{InfinityGAN}: Towards Infinite-Resolution Image Synthesis},
            journal={arXiv preprint arXiv:2104.03963},
            year = {2021}
            }
        
sym

Controllable Image Synthesis via SegVAE
Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, Ming-Hsuan Yang
ECCV 2020

webpage | abstract | bibtex | arXiv | code

Flexible user controls are desirable for content creation and image editing. A semantic map is commonly used intermediate representation for conditional image generation. Compared to the operation on raw RGB pixels, the semantic map enables simpler user modification. In this work, we specifically target at generating semantic maps given a label-set consisting of desired categories. The proposed framework, SegVAE, synthesizes semantic maps in an iterative manner using conditional variational autoencoder. Quantitative and qualitative experiments demonstrate that the proposed model can generate realistic and diverse semantic maps. We also apply an off-the-shelf image-to-image translation model to generate realistic RGB images to better understand the quality of the synthesized semantic maps. Furthermore, we showcase several real-world image-editing applications including object removal, object insertion, and object replacement.

        @inproceedings{cheng2020segvae,
            Author = {
            Cheng, Yen-Chi and 
            Lee, Hsin-Ying and
            Sun, Min and
            Yang, Ming-Hsuan
            },
            Title = {Controllable Image Synthesis via {SegVAE}},
            Booktitle = {ECCV},
            Year = {2020}
           }
        
sym

Point-to-Point Video Generation
Tsun-Hsuan Wang*, Yen-Chi Cheng*, Chieh Hubert Lin, Hwann-Tzong Chen, Min Sun
ICCV 2019
(* indicates equal contribution)

webpage | abstract | bibtex | arXiv | code

While image synthesis achieves tremendous breakthroughs (e.g., generating realistic faces), video generation is less explored and harder to control, which limits its applications in the real world. For instance, video editing requires temporal coherence across multiple clips and thus poses both start and end constraints within a video sequence. We introduce point-to-point video generation that controls the generation process with two control points: the targeted start- and end-frames. The task is challenging since the model not only generates a smooth transition of frames but also plans ahead to ensure that the generated end-frame conforms to the targeted end-frame for videos of various lengths. We propose to maximize the modified variational lower bound of conditional data likelihood under a skip-frame training strategy. Our model can generate end-frame-consistent sequences without loss of quality and diversity. We evaluate our method through extensive experiments on Stochastic Moving MNIST, Weizmann Action, Human3.6M, and BAIR Robot Pushing under a series of scenarios. The qualitative results showcase the effectiveness and merits of point-to-point generation.

          @inproceedings{wang2019p2pvg,
              Author = {Wang, Tsun-Hsuan and 
                      Cheng, Yen-Chi and 
                      Lin, Chieh Hubert and 
                      Chen, Hwann-Tzong and 
                      Sun, Min},
              Title = {Point-to-Point Video Generation},
              Booktitle = {ICCV},
              Year = {2019}
              }
          
sym

Radiotherapy Target Contouring with Convolutional Gated Graph Neural Network
Chun-Hung Chao, Yen-Chi Cheng, Hsien-Tzu Cheng, Chi-Wen Huang, Tsung-Ying Ho, Chen-Kan Tseng, Le Lu, Min Sun
NeurIPS 2018 Workshop (Spotlight)

abstract | bibtex | arXiv

Tomography medical imaging is essential in the clinical workflow of modern cancer radiotherapy. Radiation oncologists identify cancerous tissues, applying delineation on treatment regions throughout all image slices. This kind of task is often formulated as a volumetric segmentation task by means of 3D convolutional networks with considerable computational cost. Instead, inspired by the treating methodology of considering meaningful information across slices, we used Gated Graph Neural Network to frame this problem more efficiently. More specifically, we propose convolutional recurrent Gated Graph Propagator (GGP) to propagate high-level information through image slices, with learnable adjacency weighted matrix. Furthermore, as physicians often investigate a few specific slices to refine their decision, we model this slice-wise interaction procedure to further improve our segmentation result. This can be set by editing any slice effortlessly as updating predictions of other slices using GGP. To evaluate our method, we collect an Esophageal Cancer Radiotherapy Target Treatment Contouring dataset of 81 patients which includes tomography images with radiotherapy target. On this dataset, our convolutional graph network produces state-of-the-art results and outperforms the baselines. With the addition of interactive setting, performance is improved even further. Our method has the potential to be easily applied to diverse kinds of medical tasks with volumetric images. Incorporating both the ability to make a feasible prediction and to consider the human interactive input, the proposed method is suitable for clinical scenarios.

            @article{chao18radiotherapy,
                title     = {Radiotherapy Target Contouring with Convolutional Gated Graph Neural
                             Network},
                author    = {Chao, Chun-Hung and Cheng, Yen-Chi and Cheng, Hsien-Tzu and Huang, Chi-Wen and
                             Ho, Tsung-Ying and Tseng, Chen-Kan
                            Lu, Le and Sun, Min},
                journal   = {arXiv preprint arXiv:1904.02912},
                year      = {2019},
                }
            
  Projects
sym

PyTorch VideoVAE
A PyTorch implementation of a video generation method with attribute control using VAE by He et al., ECCV 2018.
| code |

sym

PyTorch SegInpaint
A PyTorch implementation of an inpainting method based on Song et al., BMVC 2018.
| code |

  Professional Activity
  • Reviewer: ICML 2020, CVPR 2021, ICCV 2021, NeurIPS 2021, AAAI 2022
  Awards
  • ICCV 2019 Travel Award

Template from this awesome website.