Hang Zhang (张航) is an Applied Scientist working with Dr. Mu Li and Dr. Alex Smola in Amazon AI. Prior to joining Amazon, he was pursuing his PhD under the supervision of Prof. Kristin Dana at Rutgers University (2013-2017). Before coming to Rutgers, he received his bachelor degree from Southeast University (Nanjing, China) in 2013. 知
• New! [Jun, 2019] AutoGluon is out, checkout the automatic deep learning toolkit at autogluon.mxnet.io .
• New! [Jun, 2019] I will co-organize the "Everything You Need to Know to Reproduce SOTA Deep Learning Models" tutorial at ICCV 2019 .
• New! [Feb, 2019] Two papers are accepted to CVPR 2019 .
• New! [Jun, 2018] We have released the source code of EncNet with pretrained models.
• New! [Mar, 2018] We have released Synchronized Cross-GPU Batch Normalization using MXNet Gluon and PyTorch.
• New! [Feb, 2018] Two papers are accepted to CVPR 2018 (1 oral + 1 poster).
• New! [Sep, 2017] I have defended my PhD thesis and will be working with Amazon AI.
• New! [Apr, 2017] We have released PyTorch version of [MSG-Net & Neural Style baseline] and [Deep Encoding].
• New! [Apr, 2017] I am selected to attend the CVPR 2017 Doctoral Consortium in Hawaii.
• New! [Mar, 2017] We have released the demo video and the code for MSG-Net.
• New! [Feb, 2017] Two papers are accepted to CVPR 2017.
• New! [Dec, 2016] We have released the code for Deep Encoding.

## Publications

 Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources Haibin Lin, Hang Zhang, Yifei Ma, Tong He, Zhi Zhang, Sheng Zha, Mu Li arXiv, 2019 paper abstract bibtex slides With an increasing demand for training powers for deep learning algorithms and the rapid growth of computation resources in data centers, it is desirable to dynamically schedule different distributed deep learning tasks to maximize resource utilization and reduce cost. In this process, different tasks may receive varying numbers of machines at different time, a setting we call elastic distributed training. Despite the recent successes in large mini-batch distributed training, these methods are rarely tested in elastic distributed training environments and suffer degraded performance in our experiments, when we adjust the learning rate linearly immediately with respect to the batch size. One difficulty we observe is that the noise in the stochastic momentum estimation is accumulated over time and will have delayed effects when the batch size changes. We therefore propose to smoothly adjust the learning rate over time to alleviate the influence of the noisy momentum estimation. Our experiments on image classification, object detection and semantic segmentation have demonstrated that our proposed Dynamic SGD method achieves stabilized performance when varying the number of GPUs from 8 to 128. We also provide theoretical understanding on the optimality of linear learning rate scheduling and the effects of stochastic momentum. @article{lin2019dynamic, title={Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources}, author={Lin, Haibin and Zhang, Hang and Ma, Yifei and He, Tong and Zhang, Zhi and Zha, Sheng and Li, Mu}, journal={arXiv preprint arXiv:1904.12043}, year={2019} }  Co-occurrent Features in Semantic Segmentation Hang Zhang, Han Zhang, Chenguang Wang, Junyuan Xie IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019 paper abstract bibtex Recent work has achieved great success in utilizing global contextual information for semantic segmentation, including increasing the receptive field and aggregating pyramid feature representations. In this paper, we go beyond global context and explore the fine-grained representation using co-occurrent features by introducing Co-occurrent Feature Model, which predicts the distribution of co-occurrent features for a given target. To leverage the semantic context in the co-occurrent features, we build an Aggregated Co-occurrent Feature (ACF) Module by aggregating the probability of the co-occurrent feature within the co-occurrent context. ACF Module learns a fine-grained spatial invariant representation to capture co-occurrent context information across the scene. Our approach significantly improves the segmentation results using FCN and achieves superior performance 54.0% mIoU on Pascal Context, 87.2% mIoU on Pascal VOC 2012 and 44.89% mIoU on ADE20K datasets with ResNet-101 base network. @InProceedings{Zhang_2019_CVPR, author = {Hang Zhang and Han Zhang and Chenguang Wang and Junyuan Xie}, title = {Co-occurrent Features in Semantic Segmentation}, booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2019} }  Bag of Tricks for Image Classification with Convolutional Neural Networks Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019 paper abstract bibtex code Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods. In the literature, however, most refinements are either briefly mentioned as implementation details or only visible in source code. In this paper, we will examine a collection of such refinements and empirically evaluate their impact on the final model accuracy through ablation study. We will show that, by combining these refinements together, we are able to improve various CNN models significantly. For example, we raise ResNet-50’s top-1 validation accuracy from 75.3% to 79.29% on ImageNet. We will also demonstrate that improvement on image classification accuracy leads to better transfer learning performance in other application domains such as object detection and semantic segmentation. @InProceedings{Xie2018bags, title={Bag of Tricks to Train Convolutional Neural Networks for Image Classification}, author={Tong He and Zhi Zhang and Hang Zhang and Zhongyue Zhang and Junyuan Xie and Mu Li}, booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2018} }  Context Encoding for Semantic Segmentation Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 Oral (70/3309=2.1%) paper abstract bibtex code talk slides Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps. The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN. Our approach has achieved new state-of-the-art results 51.7% mIoU on PASCAL-Context, 85.9% mIoU on PASCAL VOC 2012. Our single model achieves a final score of 0.5567 on ADE20K test set, which surpasses the winning entry of COCO-Place Challenge 2017. In addition, we also explore how the Context Encoding Module can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset. Our 14 layer network has achieved an error rate of 3.45%, which is comparable with state-of-the-art approaches with over 10 times more layers. @InProceedings{Zhang_2018_CVPR, author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit}, title = {Context Encoding for Semantic Segmentation}, booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2018} }  Deep Texture Manifold for Ground Terrain Recognition Jia Xue, Hang Zhang, Kristin Dana IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 paper abstract bibtex code video We develop a texture network called Deep Encoding Pooling Network (DEP) for the task of ground terrain recognition. Recognition of ground terrain is an impor- tant task in establishing robot or vehicular control param- eters, as well as for localization within an outdoor envi- ronment. The architecture of DEP integrates local texture details and spatial information in a novel method and the performance of DEP surpasses state-of-the-art methods for this task. The GTOS database (comprised of over 30,000 images of 40 classes of ground terrain) enables supervised recognition. For evaluation under realistic conditions, we use test images that are not from the existing GTOS (ground terrain in outdoor scenes) dataset, but are instead from a hand-held mobile phone videos of similar terrain. This new evaluation dataset, GTOS-mobile, consists of 81 videos of 31 classes of ground terrain such as grass, gravel, asphalt and sand. Leveraging the discriminant features learned from this network, we build a new texture manifold called DEP-manifold. We learn a parametric distribution in fea- ture space in a fully supervised manner, which provides the distance relationship among classes and providing a means to implicitly represent ambiguous class boundaries. @InProceedings{Xue_2018_CVPR, author = {Xue, Jia and Zhang, Hang and Dana, Kristin}, title = {Deep Texture Manifold for Ground Terrain Recognition}, booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2018} }  Multi-style Generative Network for Real-time Transfer Hang Zhang, Kristin Dana European Conference on Computer Vision Workshops (ECCVW), 2018 arXiv, 03/2017 paper abstract bibtex code video project Recent work in style transfer learns a feed-forward generative network to approximate the prior optimization-based approaches, resulting in real-time performance. However, these methods require training separate networks for different target styles which greatly limits the scalability. We introduce a Multi-style Generative Network (MSG-Net) with a novel Inspiration Layer, which retains the functionality of optimization-based approaches and has the fast speed of feed-forward networks. The proposed Inspiration Layer explicitly matches the feature statistics with the target styles at run time, which dramatically improves versatility of existing generative network, so that multiple styles can be realized within one network. The proposed MSG-Net matches image styles at multiple scales and puts the computational burden into the training. The learned generator is a compact feed-forward network that runs in real-time after training. Comparing to previous work, the proposed network can achieve fast style transfer with at least comparable quality using a single network. The experimental results have covered (but are not limited to) simultaneous training of twenty different styles in a single network. The complete software system and pre-trained models will be publicly available upon publication. @article{zhang2017multistyle, title={Multi-style Generative Network for Real-time Transfer}, author={Zhang, Hang and Dana, Kristin}, journal={arXiv preprint arXiv:1703.06953}, year={2017} }  Deep TEN: Texture Encoding Network Hang Zhang, Jia Xue, Kristin Dana IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017 paper abstract bibtex code blog poster slides @ Rutgers, 01/31/2017 @ Georgia Tech, 04/06/2017 @ CMU, 04/14/2017 @ UPenn, 05/06/2017. Oral presentation of MACV 2017 Download We propose a Deep Texture Encoding Network (TEN) with a novel Encoding Layer integrated on top of convolutional layers, which ports the entire dictionary learning and encoding pipeline into a single model. Current methods build from distinct components, using standard encoders with separate off-the-shelf features such as such as SIFT descriptors or pre-trained CNN features for material recognition. Our new approach provides an end-to-end learning framework, where the inherent visual vocabularies are learned directly from the loss function. That is, the features, dictionaries and the encoding representation for the classifier are all learned simultaneously. The representation is orderless and therefore is particularly useful for material and texture recognition. This Encoding Layer generalizes robust residual encoders such as VLAD and Fisher Vectors, and has the property of discarding domain specific information which makes the learned convolutional features easier to transfer. Additionally, joint training using multiple datasets of varied sizes and class labels is supported resulting in increased recognition performance. The experimental results show superior performance as compared to state-of-the-art methods using gold-standard databases such as MINC-2500, Flicker Material Database, KTH-TIPS-2b, and a new ground terrain multiview database. The source code for the complete system are publicly available. @InProceedings{Zhang_2017_CVPR, author = {Zhang, Hang and Xue, Jia and Dana, Kristin}, title = {Deep TEN: Texture Encoding Network}, booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {July}, year = {2017} }  Differential Angular Imaging for material Recognition Jia Xue, Hang Zhang, Kristin Dana, Ko Nishino IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017 paper abstract bibtex Material recognition for real-world outdoor surfaces has become increasingly important for computer vision to support its operation in the wild.'' Reflecting these needs, computational surface modeling that underlies material recognition has transitioned from reflectance modeling using in-lab controlled radiometric measurements to image-based representations based on internet-mined images of materials captured in the scene. We propose to take a middle-ground approach for material recognition that takes advantage of both rich radiometric cues and flexible image capture. We realize this by developing a framework for differential angular imaging, where small angular variations in image capture provide an enhanced appearance representation and significant recognition improvement. We build a large-scale material database, Ground Terrain in Outdoor Scenes (GTOS) database, geared towards real use for autonomous agents. The database consists of over 30,000 imaegs covering 40 classes of outdoor ground terrain under varying whether and lighting conditions. We develop a novel approach for material recognition called a Differential Angular Imaging Network (DAIN) to fully leverage this large dataset. With this novel network architecture, we extract characteristics of materials encoded in the angular and spatial gradients of their appearance. Our results show that DAIN achieves recognition performance that surpasses single view or coarsely quantized multiview images. These results demonstrate the effectiveness of differential angular imaging as a means for flexible, in-place material recognition. @InProceedings{Xue_2017_CVPR, author = {Xue, Jia and Zhang, Hang and Dana, Kristin and Nishino, Ko}, title = {Differential Angular Imaging for Material Recognition}, booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {July}, year = {2017} }  Friction from Reflectance: Deep Reflectance Codes for Predicting Physical Surface Properties from One-Shot In-Field Reflectance Hang Zhang, Kristin Dana, Ko Nishino European Conference on Computer Vision (ECCV), 2016 paper abstract bibtex code project Images are the standard input for vision algorithms, but one-shot infield reflectance measurements are creating new opportunities for recognition and scene understanding. In this work, we address the question of what reflectance can reveal about materials in an efficient manner. We go beyond the question of recognition and labeling and ask the question: What intrinsic physical properties of the surface can be estimated using reflectance?We introduce a framework that enables prediction of actual friction values for surfaces using one-shot reflectance measurements. This work is a first of its kind vision-based friction estimation. We develop a novel representation for reflectance disks that capture partial BRDF measurements instantaneously. Our method of deep reflectance codes combines CNN features and fisher vector pooling with optimal binary embedding to create codes that have sufficient discriminatory power and have important properties of illumination and spatial invariance. The experimental results demonstrate that reflectance can play a new role in deciphering the underlying physical properties of real-world scenes. @inproceedings{zhang2016friction, title={Friction from reflectance: Deep reflectance codes for predicting physical surface properties from one-shot in-field reflectance}, author={Zhang, Hang and Dana, Kristin and Nishino, Ko}, booktitle={European Conference on Computer Vision}, pages={808--824}, year={2016}, organization={Springer} }  Reflectance Hashing for Material Recognition Hang Zhang, Kristin Dana, Ko Nishino IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015 paper abstract bibtex project We introduce a novel method for using reflectance to identify materials. Reflectance offers a unique signature of the material but is challenging to measure and use for recognizing materials due to its high-dimensionality. In this work, one-shot reflectance of a material surface which we refer to as a reflectance disk is capturing using a unique optical camera. The pixel coordinates of these reflectance disks correspond to the surface viewing angles. The reflectance has class-specific stucture and angular gradients computed in this reflectance space reveal the material class. These reflectance disks encode discriminative information for efficient and accurate material recognition. We introduce a framework called reflectance hashing that models the reflectance disks with dictionary learning and binary hashing. We demonstrate the effectiveness of reflectance hashing for material recognition with a number of real-world materials. @InProceedings{zhang2015reflectance, title = {Reflectance Hashing for Material Recognition}, author = {Zhang, Hang and Dana, Kristin and Nishino, Ko}, booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, pages = {3071--3080}, year = {2015} } 

unique visitors since May 2014
keywords:张航 张航