index.html

---
layout: page
title: Wenhao (Reself) Chai
subtitle: master @UW
use-site-title: false
---

<head>
	<style>
		a { text-decoration : none; }
		a:hover { text-decoration : underline; }
		a { color : #b5194f; }
		a:visited { color : #b5194f; }
	</style>
	<script src="https://kit.fontawesome.com/5bef57b3e9.js" crossorigin="anonymous"></script>
</head>

<br>
Wenhao Chai is a master student at University of Washington, with <a href="https://ipl-uw.github.io/">Information Processing Lab</a> advised by Prof. <a href="https://people.ece.uw.edu/hwang/">Jenq-Neng Hwang</a>. Previously, he was an undergradate student at Zhejiang University, with 
<a href="https://cvnext.github.io/">CVNext Lab</a> advised by Prof. <a href="https://person.zju.edu.cn/en/gaoangwang/">Gaoang Wang</a>. He is fortunate to have internship at Multi-modal Computing Group, Microsoft Research Asia, advised by Dr. <a href="https://scholar.google.com/citations?user=Ow4R8-EAAAAJ&hl=en&oi=ao">Xun Guo</a>. 

Wenhao Chai has published papers in several renowned academic journals and international conferences, including IEEE T-MM, IEEE T-AI, ICCV, AAAI, ACM MM, BMVC, WACV, etc. His research interests include embodied agent, generative model, multi-modality learning, video understanding, and human perception model.

<br>
<br>
<hr style="height:2px;border-width:0;color:gray;background-color:gray">
<b><i class="fa-regular fa-note-sticky" style="font-size:24px"></i> Selected Publications:</b>
<p><font color="grey" size="3">
Also see <a href="https://rese1f.github.io/publications" target="_blank">Publications Page</a> and <a href="https://scholar.google.com/citations?user=SL--7UMAAAAJ&hl=en" target="_blank">Google Scholar</a>.
</font></p>

<ul>
	<li>
		<p style="font-size:16px"> 
			<strong>
			StableVideo: Text-driven Consistency-aware Diffusion Video Editing
			</strong>
			<br>
			<b>Wenhao Chai</b>, Xun Guo✉, Gaoang Wang, Yan Lu
			<br>
			<font color="#E89B00">
			<em>International Conference on Computer Vision (ICCV), 2023</em>
			</font>
			<br>
			<a href="https://rese1f.github.io/StableVideo/">[Website]</a>
			<a href="https://arxiv.org/abs/2308.09592">[Paper]</a>
			<a href="https://huggingface.co/spaces/Reself/StableVideo">[Demo]</a>
			<a href="https://github.com/rese1f/StableVideo">[Code]</a>
			<img alt="" src="https://img.shields.io/github/stars/rese1f/StableVideo?style=social">
			<br>
			<font color="grey" size="2">
			We tackle introduce temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the new objects.
			</font>
		</p>
	</li>
	<li>
		<p style="font-size:16px"> 
			<strong>
			Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
			</strong>
			<br>
			<b>Wenhao Chai</b>, Zhongyu Jiang, Jenq-Neng Hwang, Gaoang Wang✉
			<br>
			<font color="#E89B00">
			<em>International Conference on Computer Vision (ICCV), 2023</em>
			</font>
			<br>
			<a href="https://arxiv.org/abs/2303.16456">[Paper]</a>
			<a href="https://github.com/rese1f/PoseDA">[Code]</a>
			<img alt="NPM" src="https://img.shields.io/github/stars/rese1f/PoseDA?style=social">
			<br>
			<font color="grey" size="2">
			A simple yet effective framework of unsupervised domain adaptation for 3D human pose estimation.
			</font>
	  	</p>
	</li>
	<li>
		<p style="font-size:16px"> 
		    	<strong>
		    	MovieChat: From Dense Token to Sparse Memory in Long Video Understanding
		    	</strong>
		    	<br>
		    	Enxin Song*, <b>Wenhao Chai*♡</b>, Guanhong Wang*, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang✉
		    	<br>
		    	<em>arXiv Preprint.</em>
		    	<br>
		    	<a href="https://rese1f.github.io/MovieChat/">[Website]</a>
		   	<a href="https://arxiv.org/abs/2307.16449">[Paper]</a>
		    	<a href="https://github.com/rese1f/MovieChat">[Dataset]</a>
		    	<a href="https://github.com/rese1f/MovieChat">[Code]</a>
		    	<img alt="NPM" src="https://img.shields.io/github/stars/rese1f/MovieChat?style=social">
		    	<br>
		    	<font color="grey" size="2">
		    	MovieChat achieves state-of-the-art performace in long video understanding by introducing memory mechanism.
		    	</font>
		</p>
	</li>
<!-- 	<li>
		<p style="font-size:16px"> 
		    	<strong>
		    	See and Think: Embodied Agent in Virtual Environment
		    	</strong>
		    	<br>
		    	Zhonghan Zhao*, <b>Wenhao Chai*♡</b>, Xuan Wang*, Boyi Li, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang✉
		    	<br>
		    	<em>arXiv Preprint.</em>
		    	<br>
		    	<a href="https://rese1f.github.io/STEVE/">[Website]</a>
		   	<a href="https://arxiv.org/abs/2311.15209">[Paper]</a>
		    	<a href="https://github.com/rese1f/STEVE">[Dataset]</a>
		    	<a href="https://github.com/rese1f/STEVE">[Code]</a>
		    	<img alt="NPM" src="https://img.shields.io/github/stars/rese1f/STEVE?style=social">
		    	<br>
		    	<font color="grey" size="2">
		    	STEVE, named after the protagonist of the game Minecraft, is our proposed framework aims to build an embodied agent based on the vision model and LLMs within an open world.
		    	</font>
		</p>
	</li> -->
</ul>

<hr style="height:2px;border-width:0;color:gray;background-color:gray">
<b><i class="fa-solid fa-pen-to-square" style="font-size:24px"></i> Updates:</b>

<br>
<br>
<font color="grey" size="3">
  Click on the icon <i class="fa-regular fa-note-sticky" style="font-size:16px"></i> for redirection。
</font>
<br>
<br>

<ul>
	<li><i>Dec 2023:</i>
		<a href="https://arxiv.org/abs/2303.15124">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a>
		Our paper <i>Blind Inpainting with Object-aware Discrimination for Artificial Marker Removal</i> is accepted by ICASSP 2024.
	</li><br>
	
	<li><i>Dec 2023:</i>
		<a href="https://rese1f.github.io/UniAP/">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a>
		Our paper <i>UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning</i> is accepted by AAAI 2024.
	</li><br>
	
	<li><i>Dec 2023:</i>
		<a href="https://rese1f.github.io/CityGen/">
		<i class="fa-regular fa-copy" style="font-size:20px"></i></a>
		Our project <i>CityGen: Infinite and Controllable 3D City Layout Generation</i> is released.
	</li><br>
	
	<li><i>Dec 2023:</i> 
		<a href="https://rese1f.github.io/STEVE/">
		<i class="fa-regular fa-copy" style="font-size:20px"></i></a>
		Our project <i>See and Think: Embodied Agent in Virtual Environment</i> is released.
	</li><br>

	<li><i>Nov 2023:</i> 
		<a href="https://zhyjiang.github.io/ZeDO-proj/#infantZeDO">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a>
		Our paper <i>Efficient Domain Adaptation via Generative Prior for 3D Infant Pose Estimation</i> is accepted by WACV 2024 workshop: <a href="https://cv4smalls.sites.northeastern.edu/">CV4Smalls</a>.
	</li><br>
	
	<li><i>Oct 2023:</i> 
		<a href="https://zhyjiang.github.io/ZeDO-proj/">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a>
		Our paper <i>Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation</i> is accepted by WACV 2024.
	</li><br>
	
	<li><i>Sept 2023:</i> 
		<a href="https://arxiv.org/abs/2309.13770">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a>
		Our paper <i>Devil in the Number: Towards Robust Multi-modality Data Filter</i> is accepted by ICCV 2023 workshop: <a href="https://www.datacomp.ai/">TNGCV-DataComp</a>.
	</li><br>

	<li><i>Sept 2023:</i> 
		<a href="https://arxiv.org/abs/2302.06826">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a>
		Our paper <i>DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models</i> is accepted by IEEE T-MM.
	</li><br>
	
	<li><i>Aug 2023:</i> 
		<a href="https://arxiv.org/abs/2305.08824">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a>
		Our paper <i>Five A+ Network: You Only Need 9K Parameters for Underwater Image Enhancement</i> is accepted by BMVC 2023.
	</li><br>

	<li><i>July 2023:</i> 
		<a href="https://dl.acm.org/doi/10.1145/3581783.3611742">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a>
		Our paper <i>Sequential Affinity Learning for Video Restoration</i> is accepted by ACM MM 2023.
	</li><br>

	<li><i>July 2023:</i> 
		<a href="https://arxiv.org/abs/2308.09678">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a> 
		Our paper <i>PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Enhanced 3D Human Pose Estimation</i> is accepted by ACM MM 2023.
	</li><br>

	<li><i>July 2023:</i>
		<a href="https://rese1f.github.io/MovieChat/">
		<i class="fa-regular fa-copy" style="font-size:20px"></i></a> 
		Our project <i>MovieChat: From Dense Token to Sparse Memory in Long Video Understanding</i> is released.
	</li><br>

	<li><i>July 2023:</i> <img src="static/imgs/microsoft.png" width="25" height="25" style="vertical-align:text-bottom"/> Finish my internship at Microsoft Research Asia (MSRA), Beijing. I appreciate the helpful guidance and suggestions from my mentor Dr. Xun Guo during the internship.
	</li><br>

	<li><i>July 2023:</i> 
		<a href="https://rese1f.github.io/StableVideo/">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a> 
		Our paper <i>StableVideo: Text-driven Consistency-aware Diffusion Video Editing</i> is accepted by ICCV 2023.
	</li><br>

	<li><i>July 2023:</i> 
		<a href="https://arxiv.org/abs/2303.16456">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a> 
		Our paper <i>Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation</i> is accepted by ICCV 2023.
	</li><br>
	
	<li><i>June 2023:</i> <img src="static/imgs/zju.png" width="25" height="25" style="vertical-align:text-bottom"/> I graduate from Zhejiang University.
	</li><br>

	<li><i>Apr 2023:</i>
		<a href="https://openaccess.thecvf.com/content/CVPR2023W/CVFAD/html/Cao_Image_Reference-Guided_Fashion_Design_With_Structure-Aware_Transfer_by_Diffusion_Models_CVPRW_2023_paper.html">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a> 
		Our paper <i>Image Reference-Guided Fashion Design With Structure-Aware Transfer by Diffusion Models</i> is accepted by CVPR 2023 Workshop: <a href="http://conferences.visionbib.com/2023/cvpr-cvfad-6-23-call.html">Computer Vision for Fashion, Art, and Design</a>.
	</li><br>

	<li><i>Mar 2023:</i> 
		<a href="https://arxiv.org/abs/2303.00313">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a> 
		Our paper <i>Deep Learning Methods for Small Molecule Drug Discovery: A Survey</i> is accepted by IEEE T-AI.
	</li><br>

	<li><i>Mar 2023:</i> <img src="static/imgs/uw.png" width="36" height="25" style="vertical-align:text-bottom"/> Become a graduate student member of <a href="https://ipl-uw.github.io/">Information Processing Lab</a> at University of Washington, advised by Professor <a href="https://people.ece.uw.edu/hwang/">Jenq-Neng Hwang</a>.
	</li><br>
	
	<li><i>Feb 2023:</i> <img src="static/imgs/microsoft.png" width="25" height="25" style="vertical-align:text-bottom"/> Become a research intern at <a href="https://www.msra.cn/">Microsoft Research Asia (MSRA)</a>, advised by principal researcher <a href="https://scholar.google.com/citations?user=Ow4R8-EAAAAJ&hl=en&oi=ao">Xun Guo</a>.
	</li><br>

	<li><i>Oct 2022:</i> 
		<a href="https://ieeexplore.ieee.org/document/9987127">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a> 
		Our paper <i>Automatic Spinal Ultrasound Image Segmentation and Deployment for Real-time Spine Volumetric Reconstruction</i> accepted at ICUS 2022 with best paper award.
	</li><br>

	<li><i>Sep 2022:</i> 
		<a href="https://arxiv.org/abs/2209.11477">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a>
		Our paper <i>Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model</i> accepted at ICTAI 2022.
	</li><br>

	<li><i>Aug 2022:</i> <img src="static/imgs/alibaba.png" width="36" height="25" style="vertical-align:text-bottom"/> Visit <a href="https://damo.alibaba.com">DAMO Academy</a> at Alibaba.
	</li><br>

	<li><i>July 2022:</i> <img src="static/imgs/uiuc.png" width="18" height="25" style="vertical-align:text-bottom"/> Become a member of <a href="https://www.ncsa.illinois.edu/">National Center for Supercomputing Applications (NCSA)</a> at University of Illinois Urbana-Champaign, work with Professor <a href="https://cs.illinois.edu/about/people/faculty/kindrtnk">Volodymyr (Vlad) Kindratenko</a>.
	</li><br>

	<li><i>June 2022:</i> Attend CVPR 2022 at New Orleans. Here are my <a href="https://github.com/rese1f/awesome-cvpr2022">notes</a>.
	</li><br>

	<li><i>June 2022:</i>
		<a href="https://www.mdpi.com/2076-3417/12/13/6588">
		<i class="fa-regular fa-note-sticky" style="font-size:20px"></i></a> 
		Our paper <i>Deep Vision Multimodal Learning: Methodology, Benchmark, and Trend</i> accepted by Applied Science.
	</li><br>

	<li><i>July 2021:</i> Start my research on 3D human pose estimation task, advised by Professor <a href="https://person.zju.edu.cn/en/gaoangwang/">Gaoang Wang</a>.
	</li><br>

</ul>