A year spent in artificial intelligence is enough to make one believe in God.

-- Alan Perlis

Welcome to my personal website!

My name is Zhiwei Jia (贾志伟). I am a fifth-year PhD student in Computer Science at UC San Diego (previously an undergrad here). I am luckily advised by prof. Hao Su and have been working with Prof. Zhuowen Tu

I am interested in research and applications at the intersection of vision, language and robotics, especially in multimodal understanding (e.g., visual perception and reasoning) and robot learning (e.g., high-level planning and low-level object manipulations). I am also a strong believer in large foundation models (LLMs, VLMs, etc.) 

Publications & Preprints

Chain-of-Thought Predictive Control

Z. Jia, F. Liu, V. Thumuluri, L. Chen, Z. Huang, H. SuICLR 2023 (RRL Workshop) [code] [webpage]
A powerful imitation learning algorithm that solves hard low-level control tasks (e.g., contact-rich object manipulations) by empowering sequence modeling models (i.e., GPT) with chain-of-thought predictions.

KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of VLMs

Z. Jia, P. Narayana, A. Akula, G. Pruthi, H. Su, S. Basu, V. JampaniACL 2023 [arXiv]
The first empirical study of image ad understanding through the lens of large vision-language models, where we augment these models with real-world knowledge.

MetaCLUE: Towards Comprehensive Visual Metaphors Research

A. Akula, B. Driscoll, P. Narayana, S. Changpinyo, Z. Jia, S. Damle, G. Pruthi, S. Basu, L. Guibas, W. Freeman, Y. Li, V. JampaniCVPR 2023 [arXiv] [webpage]
This is the first work that introduces a concrete step towards AI systems with human-like creative capabilities via a benchmark on visual metaphor understanding and generation.

Improving Policy Optimization with Generalist-Specialist Learning

Z. Jia, X. Li, Z. Ling, S. Liu, Y. Wu, H. SuICML 2022 [arXiv] [code] [webpage]
We tackle the large-scale policy optimization problem with a novel framework called GSL that utilizes both joint training and distributed specialist training to ease policy learning. We consider GSL as a step towards generalizable policies.

Learning to Act with Affordance-Aware Multimodal Neural SLAM

Z. Jia, K. Lin, Y. Zhao, Q. Gao, G. Thattai, G. SukhatmeIROS 2022 [arXiv] [code] [webpage]
We designed a framework that employs multimodal exploration to acquire an affordance-aware semantic representation for solving complex long-horizon indoor tasks. We achieved very competitive results in the ALFRED Challenge.

LUMINOUS: Indoor Scene Generation for Embodied AI Challenges

Y. Zhao, K. Lin, Z. Jia, Q. Gao, G. Thattai, J. Thomason, G. SukhatmeNeurIPS 2022 (CtrlGen Workshop) [arXiv] [code]
We developed a framework to synthesize large-scale simulated scenes via randomization for training and evaluating Embodied AI challenges. 

ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations

T. Mu, Z. Ling, F. Xiang, D. Yang, X. Li, S. Tao, Z. Huang,  Z. Jia, Hao SuNeurIPS 2021 [arXiv] [code] [webpage]
We proposed a benchmark for generalizable physical object manipulation from 3D visual inputs. It features large intra-class topological and geometric variations, carefully designed tasks and a large number of demonstrations.

Semantically Robust Unpaired Image Translation for Data with Unmatched Semantics Statistics

Z. Jia, B. Yuan, K. Wang, H. Wu, D. Clifford, Z. Yuan, H. SuICCV 2021 [arXiv] [code]
We proposed a novel multi-scale "semantic robustness" loss for GAN-based image translation models to reduce semantics flipping that is common in unpaired image-to-image translation tasks.

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

T. Mu, J. Gu, Z. Jia, H. Tang, H. Su NeurIPS 2020 [arXiv] [code]
We proposed a two-stage framework to achieve compositional generalization in RL tasks by refactoring a teacher policy into a much more generalizable student policy with the help of strong inductive bias.  

One-pixel Signature: Characterizing CNN Classifiers for Backdoor Detection

S. Huang, W. Peng, Z. Jia, Z. Tu ECCV 2020 [arXiv]
We proposed a model-agnostic metric, namely One-pixel Signature, that can be used to effectively detect backdoored CNN. Our method achieves a substantial improvement (~30% in absolute detection accuracy) over the current state-of-the-art approaches.

Information-Theoretic Local Minima Characterization and Regularization

Z. Jia, H. Su ICML 2020 [arXiv] [code]
We proposed a metric of neural network minima that is both strongly indicative of its generalizability and may be effectively applied as a practical regularizer with both theoretical and empirical justifications. 

Work Experience

Research Intern @ Google AI (06/2022 ~ 09/2022)

Studied knowledge-augmented adaptation of foundational vision-language models for image ad understanding (information retrieval and visual reasoning).

Research Intern @ Amazon Alexa AI (06/2021 ~ 09/2021)

Proposed a new multi-modal neural SLAM-based method that achieved state-of-the-art performance for ALFRED (an indoor navigation & interaction challenge).

Research Intern @ Google [x] (06/2020 ~ 09/2020)

Proposed a novel multi-scale "semantic robustness" loss for GAN-based image translation models to reduce semantics flipping that is common in unpaired image-to-image translation tasks.

Software Engineer Intern @ Quora (06/2018 ~ 09/2018 &  06/2019 ~ 09/2019)

Developed a novel text embedding method by combining BERT and a deep tree classifier to handle extreme-scale multi-label text classification.

Software Engineer Intern @ Google (06/2017 ~ 09/2017)

Worked on an applied machine learning project. 


Ph.D in Computer Science @ UC San Diego 

09/2018 ~ present

B.S. in Computer Science and in Applied Math @ UC San Diego

09/2014 ~ 12/2017cGPA: 3.85/4.00


Email:zjia [at] eng [dot] ucsd [dot] edusean [dot] jia [dot] z [dot] w [at] gmail [dot] com