NeurIPS 2021论文一作邀请你加入视频通话

以下文章来源于商汤学术 ,作者商汤学术

商汤学术 .

言之有AI,学之有物, 商汤学术伴你探索AI最前沿!

商汤科技、全球高校人工智能学术联盟和将门-TechBeat 人工智能社区共同主办的 NeurIPS 2021 群星闪耀云际会重磅来袭!

想对顶会的高质量论文有更多了解?
想和一作有一对一的交流?
11月30日(周二)14:00-17:00
快来将门直播间签收
 NeurIPS 2021 群星闪耀云际会!
图片
NeurIPS (Conference on Neural Information Processing Systems) 是人工智能领域的顶级国际会议,往年经常出现一票难求的情况。这场云际会,不仅不用倒时差,更有课代表提前准备好了 9 篇论文的中文版在线解读,白送的羊毛还不赶紧薅起来!
👉  点此复习  NeurIPS 2021群星闪耀云际会活动信息

活动议程

14:05-14:15

开场致辞


图片

 吕健勤 

 南洋理工大学 


个人介绍

香港中文大学客座副教授,商汤-南洋理工大学联合实验室 S-Lab 副主任。研究兴趣包括图像/视频复原、图像生成和表征学习。发表国际顶级期刊与会论文 120 余篇,其论文被引用超过 35,000 次。指导研究团队参加 NTIRE、MSCOCO、DAVIS 等计算机视觉国际比赛获得多个冠军。团队提出的 SRCNN 是图像超分辨率的标志性工作,对后续研究产生重要影响。担任 ICCV、CVPR 和 ECCV 等顶会领域主席及顶刊 IJCV 和 TPAMI 的编委。2019 年获颁南洋学者奖,入选 2019 和 2020 年度「人工智能全球 2000 位最具影响力学者榜」前 100 名学者。


个人主页

http://personal.ie.cuhk.edu.hk/~ccloy/


14:15-14:27

论文解读


Generative Occupancy Fields 

for 3D Surface-Aware Image Synthesis

图片

 徐旭东 

 香港中文大学 


个人介绍

香港中文大学多媒体实验室四年级在读博士生,师从林达华教授,曾获得国家奖学金、南京大学优秀学生、优秀毕业生、江苏省三好学生等。现在主要的研究方向为神经渲染和音视频联合学习,在 ICCV、CVPR、ECCV、NeurIPS 等会议上发表多篇论文,并担任多个 AI 领域顶会和期刊的审稿人。


个人主页:

https://sheldontsui.github.io/


论文摘要


向上滑动阅览


The advent of generative radiance fields has significantly promoted the development of 3D-aware image synthesis. The cumulative rendering process in radiance fields makes training these generative models much easier since gradients are distributed over the entire volume, but leads to diffused object surfaces. In the meantime, compared to radiance fields occupancy representations could inherently ensure deterministic surfaces. However, if we directly apply occupancy representations to generative models, during training they will only receive sparse gradients located on object surfaces and eventually suffer from the convergence problem. In this paper, we propose Generative Occupancy Fields (GOF), a novel model based on generative radiance fields that can learn compact object surfaces without impeding its training convergence. The key insight of GOF is a dedicated transition from the cumulative rendering in radiance fields to rendering with only the surface points as the learned surface gets more and more accurate. In this way, GOF combines the merits of two representations in a unified framework. In practice, the training-time transition of start from radiance fields and march to occupancy representations is achieved in GOF by gradually shrinking the sampling region in its rendering process from the entire volume to a minimal neighboring region around the surface. Through comprehensive experiments on multiple datasets, we demonstrate that GOF can synthesize high-quality images with 3D consistency and simultaneously learn compact and smooth object surfaces.




14:27-14:39

论文解读


A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

图片

 潘新钢 

 马克斯·普朗克计算机科学研究所


个人介绍

马克斯·普朗克计算机科学研究所博士后,于 2021 年在香港中文大学取得博士学位。在 CVPR, ICCV, ECCV, NeurIPS, ICLR, TPAMI 等顶级会议和期刊上发表论文十余篇。在 Tusimple 2017 车道线检测和 WAD 2018 可行驶区域分割比赛中获得冠军。现主要研究方向包括神经渲染,三维场景生成和无监督三维学习。


个人主页:

https://xingangpan.github.io/


论文摘要


向上滑动阅览


The advancement of generative radiance fields has pushed the boundary of 3Daware image synthesis. Motivated by the observation that a 3D object should look realistic from multiple viewpoints, these methods introduce a multi-view constraint as regularization to learn valid 3D radiance fields from 2D images. Despite the progress, they often fall short of capturing accurate 3D shapes due to the shape-color ambiguity, limiting their applicability in downstream tasks. In this work, we address this ambiguity by proposing a novel shading-guided generative implicit model that is able to learn a starkly improved shape representation. Our key insight is that an accurate 3D shape should also yield a realistic rendering under different lighting conditions. This multi-lighting constraint is realized by modeling illumination explicitly and performing shading with various lighting conditions. Gradients are derived by feeding the synthesized images to a discriminator. To compensate for the additional computational burden of calculating surface normals, we further devise an efficient volume rendering strategy via surface tracking, reducing the training and inference time by 24% and 48%, respectively. Our experiments on multiple datasets show that the proposed approach achieves photorealistic 3D-aware image synthesis while capturing accurate underlying 3D shapes. We demonstrate improved performance of our approach on 3D shape reconstruction against existing methods, and show its applicability on image relighting. 


Our code is available at:


https://github.com/XingangPan/ShadeGAN.





14:39-14:51

论文解读


3D Pose Transfer with Correspondence Learning and Mesh Refinement


图片

 宋超越 

 南洋理工大学 


个人介绍

南洋理工大学 S-Lab 成员,曾获得上海市优秀毕业生,本科期间参加全国大学生物联网设计竞赛获得全国一等奖。目前研究方向为三维视觉和生成模型,在 NeurIPS 发表一作论文一篇,另有一篇在投。


个人主页:

https://scholar.google.com/citations?user=4Yiz6gIAAAAJ&hl


论文摘要


向上滑动阅览


3D pose transfer is one of the most challenging 3D generation tasks. It aims to transfer the pose of a source mesh to a target mesh and keep the identity (e.g., body shape) of the target mesh. Some previous works require key point annotations to build reliable correspondence between the source and target meshes, while other methods do not consider any shape correspondence between sources and targets, which leads to limited generation quality. In this work, we propose a correspondence-refinement network to help the 3D pose transfer for both human and animal meshes. The correspondence between source and target meshes is first established by solving an optimal transport problem. Then, we warp the source mesh according to the dense correspondence and obtain a coarse warped mesh. The warped mesh will be better refined with our proposed Elastic Instance Normalization, which is a conditional normalization layer and can help to generate highquality meshes. Extensive experimental results show that the proposed architecture can effectively transfer the poses from source to target meshes and produce better results with satisfied visual performance than state-of-the-art methods.




14:51-15:03

论文解读


Garment4D: Garment Reconstruction 

from Point Cloud Sequences


图片

 洪方舟 

 南洋理工大学 


个人介绍

南洋理工大学在读博士生,导师为刘子纬教授。曾就读于清华大学软件学院,获得学士学位。2021 年获得谷歌博士奖学金。现主要研究方向为三维计算机视觉,在 CVPR,NeurIPS 等期刊会议上发表多篇论文。


个人主页:

https://hongfz16.github.io/


论文摘要


向上滑动阅览


Learning to reconstruct 3D garments is important for dressing 3D human bodies of different shapes in different poses. Previous works typically rely on 2D images as input, which however suffer from the scale and pose ambiguities. To circumvent the problems caused by 2D images, we propose a principled framework, Garment4D, that uses 3D point cloud sequences of dressed humans for garment reconstruction. Garment4D has three dedicated steps: sequential garments registration, canonical garment estimation, and posed garment reconstruction. The main challenges are two-fold: 1) effective 3D feature learning for fine details, and 2) capture of garment dynamics caused by the interaction between garments and the human body, especially for loose garments like skirts. To unravel these problems, we introduce a novel Proposal-Guided Hierarchical Feature Network and Iterative Graph Convolution Network, which integrate both high-level semantic features and low-level geometric features for fine details reconstruction. Furthermore, we propose a Temporal Transformer for smooth garment motions capture. Unlike non-parametric methods, the reconstructed garment meshes by our method are separable from the human body and have strong interpretability, which is desirable for downstream tasks. As the first attempt at this task, high-quality reconstruction results are qualitatively and quantitatively illustrated through extensive experiments.




15:08-15:20

论文解读


Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

图片

 蒋李鸣 

 南洋理工大学 


个人介绍

南洋理工大学计算机科学与工程学院 MMLab@NTU, S-Lab 在读博士生,师从吕健勤教授。目前从事计算机视觉、深度学习、生成模型相关的研究,博士前两年期间已在 NeurIPS、CVPR、ICCV、ECCV 等顶级学术会议上发表 4 篇论文(均为一作),主要工作包括 DeeperForensics‐1.0、TSIT、Focal Frequency Loss 等。曾获 ACM-ICPC 国际大学生程序设计竞赛亚洲区域赛金牌、 MCM/ICM 一等奖、国家奖学金、商汤奖学金、CCF 优秀大学生等,作为主要贡献者之一参与 MMEditing 代码库的开发。


个人主页:

https://liming-jiang.com/


论文摘要


向上滑动阅览


Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images. Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting, the underlying cause that impedes the generator's convergence. This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator. As an alternative method to existing approaches that rely on standard data augmentations or model regularization, APA alleviates overfitting by employing the generator itself to augment the real data distribution with generated images, which deceives the discriminator adaptively. Extensive experiments demonstrate the effectiveness of APA in improving synthesis quality in the low-data regime. We provide a theoretical analysis to examine the convergence and rationality of our new training strategy. APA is simple and effective. It can be added seamlessly to powerful contemporary GANs, such as StyleGAN2, with negligible computational cost.




15:20-15:32

论文解读


Unsupervised Object-Level Representation Learning 

from Scene Images


图片

 解佳豪 

 南洋理工大学


个人介绍

南洋理工大学 S-Lab 在读博士生,现主要研究方向为自监督学习,尤其是自监督表征学习。曾获 Facebook 自监督学习挑战赛全部四个赛道的冠军,是首个自监督学习开源算法库 OpenSelfSup 的核心开发者之一。目前在 NeurIPS、CVPR 等国际顶级会议上以一作身份发表论文两篇。


个人主页:

https://scholar.google.com/citations?user=yA9qseUAAAAJ&hl=en


论文摘要


向上滑动阅览


Contrastive self-supervised learning has largely narrowed the gap to supervised pre-training on ImageNet. However, its success highly relies on the object-centric priors of ImageNet, i.e., different augmented views of the same image correspond to the same object. Such a heavily curated constraint becomes immediately infeasible when pre-trained on more complex scene images with many objects. To overcome this limitation, we introduce Object-level Representation Learning (ORL), a new self-supervised learning framework towards scene images. Our key insight is to leverage image-level self-supervised pre-training as the prior to discover object-level semantic correspondence, thus realizing object-level representation learning from scene images. Extensive experiments on COCO show that ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks. Furthermore, ORL improves the downstream performance when more unlabeled scene images are available, demonstrating its great potential of harnessing unlabeled data in the wild. We hope our approach can motivate future research on more general-purpose unsupervised representation learning from scene data. Project page: 

https://www.mmlab-ntu.com/project/orl/.





15:32-15:44

论文解读


K-Net: 

Towards Unified Image Segmentation

图片

 张文蔚 

 南洋理工大学 


个人介绍

南洋理工大学在读博士生,研究领域为计算机视觉。在顶级会议上发表六篇论文,获得过 2019 年目标检测领域权威学术竞赛  COCO 比赛的第一名,2020 年 3D 目标检测领域权威竞 nuScenes 比赛 best PKL award。负责 OpenMMLab 计算机视觉算法开放体系中 MMCV, MMDetection, MMDetection3D 等算法库的设计与开发。


个人主页:

http://zhangwenwei.cn/


论文摘要


向上滑动阅览


Semantic, instance, and panoptic segmentations have been addressed using different and specialized frameworks despite their underlying connections. This paper presents a unified, simple, and effective framework for these essentially similar tasks. The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsible for generating a mask for either a potential instance or a stuff class. To remedy the difficulties of distinguishing various instances, we propose a kernel update strategy that enables each kernel dynamic and conditional on its meaningful group in the input image. K-Net can be trained in an end-to-end manner with bipartite matching, and its training and inference are naturally NMS-free and box-free. Without bells and whistles, K-Net surpasses all previous published state-of-the-art single-model results of panoptic segmentation on MS COCO test-dev split and semantic segmentation on ADE20K val split with 55.2% PQ and 54.3% mIoU, respectively. Its instance segmentation performance is also on par with Cascade Mask R-CNN on MS COCO with 60%-90% faster inference speeds. Code and models will be released at: https://github.com/zwwwayne/k-net.




15:44-15:56

论文解读


Few-Shot Object Detection 

via Association and Discrimination


图片


 曹钰杭 

 香港中文大学 


个人介绍

香港中文大学 MMLab 在读二年级博士生。现研究方向主要为通用物体检测和 Few-Shot 物体检测,是著名检测框架 MMDetection 的核心开发者之一,并在 ICCV、ECCV、CVPR、NeurIPS 等国际顶级会议上发表 4 篇论文(其中两篇一作)。


个人主页:

https://scholar.google.com/citations?user=sJkqsqkAAAAJ&hl=zh-CN


论文摘要


向上滑动阅览


Object detection has achieved substantial progress in the last decade. However, detecting novel classes with only few samples remains challenging, since deep learning under low data regime usually leads to a degraded feature space. Existing works employ a holistic fine-tuning paradigm to tackle this problem, where the model is first pre-trained on all base classes with abundant samples, and then it is used to carve the novel class feature space. Nonetheless, this paradigm is still imperfect. Durning fine-tuning, a novel class may implicitly leverage the knowledge of multiple base classes to construct its feature space, which induces a scattered feature space, hence violating the inter-class separability. To overcome these obstacles, we propose a two-step fine-tuning framework, Few-shot object detection via Association and DIscrimination (FADI), which builds up a discriminative feature space for each novel class with two integral steps. 1) In the association step, in contrast to implicitly leveraging multiple base classes, we construct a compact novel class feature space via explicitly imitating a specific base class feature space. Specifically, we associate each novel class with a base class according to their semantic similarity. After that, the feature space of a novel class can readily imitate the well-trained feature space of the associated base class. 2) In the discrimination step, to ensure the separability between the novel classes and associated base classes, we disentangle the classification branches for base and novel classes. To further enlarge the inter-class separability between all classes, a set-specialized margin loss is imposed. Extensive experiments on standard Pascal VOC and MS-COCO datasets demonstrate that FADI achieves new state-of-the-art performance, significantly improving the baseline in any shot/split by +18.7. Notably, the advantage of FADI is most announced on extremely few-shot scenarios (e.g. 1- and 3- shot).




15:56-16:08

论文解读


Density-aware Chamfer Distance 

as a Comprehensive Metric 

for Point Cloud Completion


图片

 吴桐 

 香港中文大学 

个人介绍

香港中文大学 MMLab 在读博士生,导师为林达华老师,本科毕业于清华大学电子工程系,曾获香港政府奖学金、北京市优秀毕业生、国家奖学金等。研究兴趣包括但不限于长尾识别、对抗鲁棒性和 3D 视觉。曾在 ECCV, CVPR, NeurIPS 等国际顶级会议上发表多篇论文。


个人主页:

https://wutong16.github.io/


论文摘要


向上滑动阅览


Chamfer Distance (CD) and Earth Mover’s Distance (EMD) are two broadly adopted metrics for measuring the similarity between two point sets. However, CD is usually insensitive to mismatched local density, and EMD is usually dominated by global distribution while overlooks the fidelity of detailed structures. Besides, their unbounded value range induces a heavy influence from the outliers. These defects prevent them from providing a consistent evaluation. To tackle these problems, we propose a new similarity measure named Density-aware Chamfer Distance (DCD). It is derived from CD and benefits from several desirable properties: 1) it can detect disparity of density distributions and is thus a more intensive measure of similarity compared to CD; 2) it is stricter with detailed structures and significantly more computationally efficient than EMD; 3) the bounded value range encourages a more stable and reasonable evaluation over the whole test set. We adopt DCD to evaluate the point cloud completion task, where experimental results show that DCD pays attention to both the overall structure and local geometric details and provides a more reliable evaluation even when CD and EMD contradict each other. We can also use DCD as the training loss, which outperforms the same model trained with CD loss on all three metrics. In addition, we propose a novel point discriminator module that estimates the priority for another guided downsampling step, and it achieves noticeable improvements under DCD together with competitive results for both CD and EMD. We hope our work could pave the way for a more comprehensive and practical point cloud similarity evaluation. Our code will be available at https://github.com/wutong16/Density_aware_ Chamfer_Distance.




16:13-16:43

Panel交流


图片

 刘子纬 

 南洋理工大学 


个人介绍

南洋理工大学助理教授,南洋学者。研究兴趣包括机器学习中的视觉感知与理解。发表国际顶级计算机视觉会议及期刊论文 80 余篇,总引用量 12,000 次,获得专利 30 余项。领导搭建数个国际知名计算机视觉基准数据库和开源项目,如 CelebA、Deep Fashion、MMDectetion 和 MMFashion 等。2018-2020 年任香港中文大学特聘研究员,2017-2018 年赴 UC Berkeley 任博士后,2013-2017 年博士毕业于香港中文大学多媒体实验室。曾获香港政府博士奖、ICCV 青年学者奖、HKSTP 最佳论文奖、微软小学者奖等多个领域内奖项。


个人主页:

https://liuziwei7.github.io/


图片

洪方舟 

南洋理工大学

蒋李鸣 

南洋理工大学

图片
图片

张文蔚

南洋理工大学



直播信息
11月30日(周二)14:00-17:00

锁定

图片 将门-TechBeat人工智能社区 图片

图片

-the end-
图片

扫码观看

本周上新!

图片

图片

 高质量活动太密集? 我们帮你梳理!

近期活动


 11.22(周一)

Talk360期 杜克大学在读博士生李昂


 11.23(周二)

「人工智能安全与隐私」第九期


 11.23(周二)

MMAI系列Talk①莫纳什大学在读博士余镇

 11.24(周三)

淘系技术直播①迈向元宇宙

 11.25(周四)

淘系技术直播②算法黑科技

 11.28(周日)

「料见」Waymo高级研究科学家祁芮中台

 11.30(周二)

NeurIPS 2021 论文解读群星闪耀云际会

 11.30(周二)

淘系技术直播③内容新电商


如果你也想成为讲者

 自荐 / 推荐 
图片
单人Talk | 团队专场 | 录播or直播 | 闭门交流
多种方式任你选择!
推荐讲者成功也有奖励哦~

图片


关于TechBeat人工智能社区

TechBeat(www.techbeat.net)隶属于将门创投是一个荟聚全球华人AI精英的成长社区。


我们希望为AI人才打造更专业的服务和体验,加速并陪伴其学习成长。


期待这里可以成为你学习AI前沿知识的高地,分享自己最新工作的沃土,在AI进阶之路上的升级打怪的根据地!


更多详细介绍>>TechBeat,一个荟聚全球华人AI精英的学习成长社区