Call for Papers

Topics of interest include, but are not limited to, the following:
  • AI models and algorithms for digital human modeling; explicit and implicit representations; AI-empowered rendering technique such as neural rendering; learning strategies that are more effective, efficient, and resource-friendly
  • Downstream tasks of digital humans; large foundation models for digital human generation; machine learning for face, body, hair and clothing reconstruction; face/body animation with audio or texts
  • The social impact of AI-generated characters. For example, the potential to transform industries including health and education, the potential risk of creating fake media, invading people’s privacy, impacting human workers in certain industries, etc.
  • Other relevant applications and methods, e.g. digital human in VR and metaverse, etc.
This will be a one-day workshop. In the morning session, we will have three invited speakers and a panel to discuss the important challenges in the field of AI for digital humans. In the afternoon session, we will have an oral session for the authors of the submissions to share their works. If we receive many good submissions, we will also organize a poster session. Additionally, we will organize a competition together with this workshop, and the winners will be announced during the meeting. We expect 50 or so attendance and open the workshop to all AAAI-24 participants.

Submission Format:
  • Technical Papers: Full-length research papers of up to 7 pages (excluding references and appendices)
  • Short Papers: Position or short papers of up to 4 pages (excluding references and appendices)
All papers must be submitted in PDF format, using the AAAI-24 author kit. All submissions should be done electronically via CMT.
Submission Site: https://cmt3.research.microsoft.com/AI4DH2024
Submission Due: November 24th, 2023 (11:59 PM PST)

Workshop Schedule

Best Paper Award:Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation

Time Arrangement Details
9:00 AM-9:45 AM Keynote Speech:
Digital Humans: Science and Simulation in the Time of AI
My goal is to develop realistic computational models of the human body. Models that can make useful predictions of how we move and how our bodies interact with products such as clothing. Despite a long history of scientific research, current models have many surprising shortcomings. I will first describe some fundamental problems with classical models of the complex human biomechanical system, and show how we can model it better. I will present examples of modeling human hands, eyes, muscles, and skin. I will describe breakthroughs in my lab in building personalized models of an individual's body using a complete pipeline for measurement, modeling, parameter estimation, and data-driven simulation using the finite element method. Our methods can be used to create personalized digital avatars of individuals or of a population. There are many potential applications, ranging from virtual prototyping for product design to virtual garment try-on for e-commerce. Next I will describe challenges and opportunities for machine learning in simulating the behavior of living humans. I will argue that access to high quality data remains the crucial bottleneck for these problems. How can we acquire such data, ethically and at scale? I will also explore some fundamental technical limits to sensing and accurately predicting the actual behaviors of a real system, and not just plausible behaviors.
Invited Speakers:
Dinesh K. Pai, University of British Columbia
Dinesh K. Pai is a Professor of Computer Science at the University of British Columbia, and founder of Vital Mechanics Research - a startup providing high-fidelity soft avatars for apparel fit testing. His current research is focused on data-driven digital human models, frictional contact between soft objects, machine learning for design, and technology for efficient measurement of material properties. His research is multidisciplinary, spanning computer graphics, scientific computing, robotics, biomechanics, neuroscience, and artificial intelligence. He has received many prestigious recognitions, including a Tier 1 Canada Research Chair, the 2020 CHCCS Achievement Award for Computer Graphics, UBC's Killam Research Prize, three NSERC Discovery Accelerator/Supplement Awards, and an international Human Frontier Science Program grant. Dr. Pai has been a Professor at Rutgers University and has held visiting professorships at Carnegie Mellon University's Robotics Institute, New York University's Center for Neural Science, the University of Siena (Santa Chiara Chair in Cognitive Science), and the Collège de France, Paris (Professeur Invité). He received his Ph.D. from Cornell University, Ithaca, NY, and his B.Tech. degree from the Indian Institute of Technology, Madras. See //sensorimotor.cs.ubc.ca/pai/ for more information.
9:45 AM-10:30 AM Invited Talk:
Deep Albedo: Real-time biophysically-based facial map modifcations
Altering skin parameters and colors are inherently complex and requires sophisticated physically-based models to describe. We leverage the autoencoders and simulated biophysically-based skin parameters to enable an efficient spatial-varying mapping between the biophysical parameters and their resulting skin color. The mapping enables the pixel-wise to describe age and emotion related skin color variations.
Invited Speakers:
Wei Sen Loi, Joel Johnson, Huawei Technologies Canada
10:30 AM-11:00 AM Break
11:00 AM-12:30 PM——Oral Session 1
11:00 AM-11:20 AM FuRPE: Learning Full-body Reconstruction from Part Experts
Authors: Zhaoxin Fan, Yuqing Pan, Hao Xu, Zhenbo Song Zhicheng Wang, Kejian
[PDF]
11:20 AM-11:40 AM ProbSIP: Probabilistic Modeling for Ambiguity-Reduced Sparse Inertial Poser
Authors:Shanyan Guan, Yunbo Wang, Xintao Lv, Yanhao Ge, Xiaokang Yang
[PDF]
11:40 AM-12:00 PM Deep Learning based Dialogue System for Legal Consultancy in Smart Law
Authors:Xukang Wang, Ying Cheng Wu, Xuhesheng Chen, Hongpeng Fu, Jiaqi Tan, Mengjie Zhou
[PDF]
12:00 PM-12:20 PM Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation
Authors: Likun Li, Haoqi Zeng, Changpeng Yang, Haozhe Jia, Di Xu
[PDF]
12:20 PM-12:30 PM Announcement of Competition Awards
12:30 PM-2:00 PM——Lunch
2:00 PM-4:00 PM——Oral Session 2
2:00 PM-2:20 PM Structural Learning in the design of Perspective-Aware AI Systems using Knowledge Graphs
Authors: Marjan Alirezaie, Hossein Rahnama, Alex Pentland
[PDF]
2:20 PM-2:40 PM The Role of Facial and Speech Features in Emotion Classification
Authors:Loïc Houmard, Ard Kastrati, Dushan Vasilevski, Roger Wattenhofer
[PDF]
2:40 PM-3:00 PM Understanding Consumers' Attitude Toward Digital Humans In Influencer Marketing
Authors:Smitha Muthya Sudheendra, Maral Abdollahi, Jisu Huh, Jaideep Srivastava
[PDF]
3:00 PM-3:20 PM Realistic Human Generation with Controllable Poses Using 3D Priors
Authors:Ruifeng Bai, Xiaohang Liu, Haozhe Jia, Wei Zhang, Changpeng Yang, Di Xu
[PDF]
3:20 PM-3:40 PM Prompt-Propose-Verify: A Reliable Hand-Object-Interaction Data Generation Framework using Foundational Models
Authors:Gurusha Juneja, Sukrit Kumar
[PDF]
3:40 PM-4:00 PM Latents2Semantics: Leveraging the Latent Space of Generative Models for Localized Style Manipulation of Face Images
Authors:Snehal Singh Tomar, A.N. Rajagopalan
[PDF]
4:00 PM-5:00 PM——Poster Session
Host: Benjamin MacAdam
Date: Monday,Feb 26,2024

Digital Human Challenge

The challenges' deadline is due at January 1st, 2024.
Note that teams that win the prize should submit their code and a brief summary at the end of the competition.
Task 1:Self-Supervised Face Geometry Reconstruction Competition
  • The objective of this task is to introduce self-supervised learning for face appearance reconstruction. Traditional photometric-based methods require capturing multi-view images under different lighting conditions to obtain high-quality facial appearance assets. However, this often requires the use of large devices like light-stages. In this competition, we encourage participants to calculate detailed digital face appearances from limited lighting and a small number of photos without relying on high-end equipment. Participants need to design a self-supervised learning framework to decompose detailed normal maps and a displacement map.
  • Link: https://www.codabench.org/competitions/1601/
  • Organizers: Yuhao Cheng, Xingyu Ren, Xuanchen Li (Shanghai Jiao Tong University), Huang Xu (Huawei Cloud)
  • Contact: chengyuhao@sjtu.edu.cn
Ranking Team Institution
1 iFLYTEK-CV iFLYTEK Research & University of Science and Technology of China
2 Excit AI Shanghai Excit AI Technology
3 USTC-IAT-United University of Science and Technology of China & Unisound AI Technology Co.,Ltd
Task 2: Semi-supervised 3D Skull Reconstruction
  • In this task, we expect the cranial reconstruction algorithm to reconstruct the anatomical geometry of the skull based on the information provided by MRI, including but not limited to the precise geometry of the external surface of the face, the internal soft tissue structures of the face, and the characteristic points of the skull. This competition encourages participants to develop a semi- supervised learning-based cranial segmentation algorithm for MRI images using machine learning and deep learning techniques. The segmentation algorithm can fully exploit the information of unlabeled training samples based on a small number of labeled samples while improving the performance and generalization of the segmentation model. Ultimately, the high-quality segmentation results obtained on MRI will serve as the basis for 3D skull reconstruction.
  • Link: https://www.kaggle.com/competitions/skull-reconstruction
  • Organizers: Hengfei Cui, Yong Xia, Fan Zheng, Yifan Wang (Northwestern Polytechnical University)
  • Contact: zhengfan@mail.nwpu.edu.cn
Ranking Team Institution
1 PPW Northwestern Polytechnical University & Southwest Jiaotong University & Hubei University of Technology
2 USTC_IAT_United University of Science and Technology of China, Unisound AI Technology Co.,Ltd
3 zhanggod Huazhong University of Science and Technology
Task 3: Multi-modal Learning for Audio-driven Talking Head Generation
  • In this task, we specifically focus on the ability to generate talking heads with realistic facial expressions and natural head poses that match the accompanying audio. By learning from both the audio and visual data, this task focuses on developing a multi-modal learning model for talking head generation. We further design the following two settings to motivate the participants to tackle this task with 2D and 3D methods, respectively.
    Single image setting: Participants are allowed to train their model with external data, and we will provide a single image for their models to animate.
    Video setting: Participants are not allowed to train their model with external data, while a two-minute training video is provided for training a personalized talking head model.
    The final output of both settings should be a talking head model that can be driven by any input audio. For the image setting, the synthesized talking head videos are expected to be lip-synchronized and with high fidelity. As for the video setting, the speakers should additionally have various natural poses during talking while preserving their identity.
  • Link: https://www.kaggle.com/competitions/audio-driven-talking-head-generation/
  • Organizers: Jingnan Gao(Shanghai Jiao Tong University) , Changpeng Yang, Yuan Gao, Li Li (Huawei Cloud)
  • Contact: gjn0310@sjtu.edu.cn
Ranking Team Institution
1 BDIV Lab Xidian University
2 USTC-IAT-United University of Science and Technology of China,Ping An Technology
3 Excit AI Shanghai Excit AI Technology
Task 4: Audio-Driven Co-Speech Gesture Video Generation
  • In this task, participants are required to synthesize co-speech gesture videos of a target person based on any given audio. The final output should be a rendered video, instead of motion sequences. Generally, the synthesized gesture motions in this task are expected to be natural, difficult to distinguish from captured videos, and consistent with audio in terms of rhythm, semantics, and style. We encourage participants to propose novel ideas for synthesizing high-fidelity co-speech gestures.
  • Link: https://www.kaggle.com/competitions/audio-driven-co-speech-gesture-video-generation
  • Organizers: Minglei Li, Haoqi Zeng(Huawei Cloud),Zhensong Zhang, Xiaofei Wu, Yiren Zhou (Huawei Noah’s Ark Lab)
  • Contact: zenghaoqi@huawei.com
Ranking Team Institution
1 HaiweiXue_DragonBooOOM Tsinghua University
2 USTC_IAT_United University of Science and Technology of China
3 XDU_VIPSLab Xidian University