Hi, I'm Fei Zhang (张菲)

I am now a 3rd-year Ph.D. student in Shanghai Jiao Tong Univeristy & Shanghai Innovation Insitute, and fortunately advised by Jiaochao Yao, Tianfei Zhou (BIT), Pengfei Liu, and Ya Zhang.

Before that, I obtained my master's degree from Shanghai Jiao Tong University, fortunately advised by Chaochen Gu, Xinping Guan, and Yuchao Dai (NWPU). Prior to that, I obtained my bachelor's degree of Automation in Northwestern Polytechnical University, where I had a marvelous time in Xi'an!

My research primarily focuses on multi-modal representation learning and data-efficient learning, generally covering visual fine-grained recognition, multi-modal alignment, and visual generation. Now I am also a new beginner of the unified model/world model. I am always open to research discussions and collaborations; please feel free to contact me via email (ferenas AT sjtu.edu.cn).

Experience

2025.12-2026.4

Meta Research Scientist Intern, working on video generation, specifically on RGB-Alpha generation. I worked on an VAE-training-free method to enable an off-the-shelf video generative model to glyph Alpha representation ability. Meanwhile, I also took participate in the developement of unified model in our team.

2024.07-2025.12

Qwen Research Scientist Intern, working on developing Qwen3-VL. I specifically worked on helping improving the visual fine-grained recognition ability, and exploring effective multi-modal fusion mechanims. Besides, I helped improving the Qwen3-VL's ability on the multi-image recognition.

Selected Publications

TransText: Alpha-as-RGB Representation for Transparent Text Animation

PDF Code Project

Fei Zhang, Zijian Zhou, Bohao Tang, Sen He, Hang Li, Zhe Wang, Soubhik Sanyal, Pengfei Liu, Viktar Atliha, Tao Xiang, Frost Xu, Semih Gunel

A VAE-training-free I2V RGB-Alpha video generative framework, enabling fine-grained alpha-channel generation with simple spatial concatenation.

Qwen3-VL Technical Report

PDF Code Project

Bai Shuai, ... Fei Zhang (Core Contributor), ... et al.

The most powerful pure open-sourced vision-language model.

ConText: Driving In-context Learning for Text Removal and Segmentation (ICML'25)

PDF Code

Fei Zhang, Pei Zhang, Baosong Yang, Fei Huang, Yanfeng Wang, Ya Zhang

The first exploration of establishing a visual in-context learning paradigm for fine-grained text recognition tasks.

Decouple before align: Visual disentanglement enhances prompt tuning (T-PAMI'25)

PDF Code

Fei Zhang, Tianfei Zhou, Jiangchao Yao, Ya Zhang, Ivor W. Tsang, Yanfeng Wang

Aligning visual and textual representation in a vision-decoupling manner, yielding fine-grained recognition improvement.

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation (NeurIPS'23)

PDF Code

Fei Zhang, Tianfei Zhou, Boyang Li, Hao He, Chaofan Ma, Tianjiao Zhang, Jiangchao Yao, Ya Zhang, Yanfeng Wang

Introducing prototupical knowledge to help better align visual and textual representation, yielding improved visual dense recognition.

Exploiting class activation value for partial-label learning (ICLR'22)

PDF Code

Fei Zhang, Lei Feng, Bo Han, Tongliang Liu, Gang Niu, Tao Qin, Masashi Sugiyama

Introducing visual class activation value to help address weakly supervised learning, implicitly forming accurate clean label through vision knowledge.

Complementary Patch for Weakly Supervised Semantic Segmentation (ICCV'21)

PDF Code

Fei Zhang, Chaochen Gu, Chenyue Zhang, Yuchao Dai

Introducing complementary visual representation to enhance the activation of dense visual knowledge, explicitly addressing weakly-supervised semantic segmentation.

View All on Google Scholar