A year spent in artificial intelligence is enough to make one believe in God.
-- Alan Perlis
Welcome to my personal website!
My name is Zhiwei Jia (贾志伟). I am a fifth-year PhD student in Computer Science at UC San Diego (previously an undergrad here). I am luckily advised by prof. Hao Su and have been working with Prof. Zhuowen Tu.
I am interested in developing generalizable deep learning models in the intersection of vision, language and robotics, especially in multimodal understanding (e.g., visual perception and reasoning) and robot learning (e.g., high-level planning and low-level object manipulations).
Publications & Preprints
Improving Policy Optimization with Generalist-Specialist Learning
Zhiwei Jia, Xuanlin Li, Zhan Ling, Shuang Liu, Yiran Wu, Hao SuICML 2022 [arXiv] [code] [webpage]We tackle the large-scale policy optimization problem with a novel framework called GSL that utilizes both joint training and distributed specialist training to ease policy learning. We consider GSL as a step towards generalizable policies.
Learning to Act with Affordance-Aware Multimodal Neural SLAM
Zhiwei Jia, Kaixiang Lin, Yizhou Zhao, Qiaozi Gao, Govind Thattai, Gaurav SukhatmeIROS 2022 [arXiv] [code] [webpage]We designed a framework that employs multimodal exploration to acquire an affordance-aware semantic representation for solving complex long-horizon indoor tasks. We achieved very competitive results in the ALFRED Challenge.
LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi Gao, Govind Thattai, Jesse Thomason, Gaurav SukhatmeNeurips 2022 (CtrlGen Workshop) [arXiv] [code]We developed a framework to synthesize large-scale simulated scenes via randomization for training and evaluating Embodied AI challenges.
ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations
Tongzhou Mu, Zhan Ling, Fanbo Xiang, Derek Yang, Xuanlin Li, Stone Tao, Zhiao Huang, Zhiwei Jia, Hao SuNeurIPS 2021 (Dataset Track) [arXiv] [code] [webpage]We proposed a benchmark for generalizable physical object manipulation from 3D visual inputs. It features large intra-class topological and geometric variations, carefully designed tasks and a large number of demonstrations.
Semantically Robust Unpaired Image Translation for Data with Unmatched Semantics Statistics
Zhiwei Jia, Bodi Yuan, Kangkang Wang, Hong Wu, David Clifford, Zhiqiang Yuan, Hao SuICCV 2021 [arXiv] [code]We proposed a novel multi-scale "semantic robustness" loss for GAN-based image translation models to reduce semantics flipping that is common in unpaired image-to-image translation tasks.
Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals
Tongzhou Mu, Jiayuan Gu, Zhiwei Jia, Hao Tang, Hao Su NeurIPS 2020 [arXiv] [code]We proposed a two-stage framework to achieve compositional generalization in RL tasks by refactoring a teacher policy into a much more generalizable student policy with the help of strong inductive bias.
One-pixel Signature: Characterizing CNN Classifiers for Backdoor Detection
Shanjiaoyang Huang, Weiqi Peng, Zhiwei Jia, Zhuowen Tu ECCV 2020 [arXiv]We proposed a model-agnostic metric, namely One-pixel Signature, that can be used to effectively detect backdoored CNN. Our method achieves a substantial improvement (~30% in absolute detection accuracy) over the current state-of-the-art approaches.
Information-Theoretic Local Minima Characterization and Regularization
Zhiwei Jia, Hao Su ICML 2020 [arXiv] [code]We proposed a metric of neural network minima that is both strongly indicative of its generalizability and may be effectively applied as a practical regularizer with both theoretical and empirical justifications.
Work Experience
Research Intern @ Google AI (06/2022 ~ 09/2022)
Studied knowledge-augmented adaptation of foundational vision-language models for image ad understanding (information retrieval and visual reasoning).Research Intern @ Amazon Alexa AI (06/2021 ~ 09/2021)
Proposed a new multi-modal neural SLAM-based method that achieved state-of-the-art performance for ALFRED (an indoor navigation & interaction challenge).Research Intern @ Google [x] (06/2020 ~ 09/2020)
Proposed a novel multi-scale "semantic robustness" loss for GAN-based image translation models to reduce semantics flipping that is common in unpaired image-to-image translation tasks.Software Engineer Intern @ Quora (06/2018 ~ 09/2018 & 06/2019 ~ 09/2019)
Developed a novel text embedding method by combining BERT and a deep tree classifier to handle extreme-scale multi-label text classification.Software Engineer Intern @ Google (06/2017 ~ 09/2017)
Worked on an applied machine learning project.Education
Ph.D in Computer Science @ UC San Diego
09/2018 ~ presentB.S. in Computer Science and in Applied Math @ UC San Diego
09/2014 ~ 12/2017cGPA: 3.85/4.00