Computer Vision Reading Group

7月 18, 2017

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

paper

一、论文思想

训练一个目标检测器,对遮挡和形变鲁棒,目前的主要方法是增加不同场景下的图像数据,但这些数据有时又特别少。作者提出使用对抗生成有遮挡或形变的样本,这些样本对检测器来说识别比较困难,使用这些困难的正样本训练可以增加检测器的鲁棒性。 使用对抗网络生成有遮挡和有形变的两种特征,分别对应网络ASDN和ASTN。使用对抗网络生成有遮挡和有形变的两种特征,分别对应网络ASDN和ASTN。

1.ASDN

  • FAST-RCNN中RoI-池化层之后的每个目标proposal卷积特征作为对抗网络的输入,给定一个目标的特征,ASDN尝试生成特征某些部分被dropout的掩码,导致检测器无法识别该物体。
  • ASDN网络初始化: 给定尺寸大小为d×d的特征图X,使用d3×d3的滑动窗,并将滑动窗位置映射到原图,将原图对应位置清零,生成新的特征向量,传入到分类层计算损失,选择具有最大损失的滑动窗,用这个窗口生成二值掩码M(滑动窗位置为1,其余位置为0),用n个目标proposal生成n对对抗网络的训练样本(x1,M1),...,(xn,Mn) ,使用二值交叉熵损失训练ASDN:

cross_extropy

  • 在前向传播过程中,首先使用ASDN在RoI-池化层之后生成特征掩码,然后使用重要性采样法生成二值掩码,使用该掩码将特征对应部位值清零,修改后的特征继续前向传播计算损失。这个过程生成了困难的特征,用于训练检测器。训练过程流程图如下所示:

ASDN

2.ASTN

  • STN网络包含三部分:定位网络,网格生成器,采样器。定位网络估计出形变的参数(旋转角度、平移距离和缩放因子)。这三个参数作为后两部分的输入,输出是形变后的特征图。论文主要学习定位网络的三个参数。
  • ASTN: 主要关注特征旋转,定位网络包含三层全连接层,前两层是ImageNet预训练的fc6和fc7,训练过程与ASDN类似,ASTN对特征进行形变,使得ASTN将正样本识别成负样本。将特征图划分为4个block,每个block估计四个方向的旋转,增加了任务的复杂度。
  • 两种对抗网络可以相结合,使得检测器更鲁棒,RoI-池化层提取的特征首先传入ASDN丢弃一些激活,之后使用ASTN对特征进行形变,如下图所示: ASTNandASDN
  • ASDN 与 ASTN 网络组合架构示意。首先创建遮挡蒙版,随后旋转路径以产生用于训练的例子。

二、训练

stage1:training a standard Fast-RCNN

./experiments/scripts/fast_rcnn_std.sh  [GPU_ID]  VGG16 pascal_voc

stage2:pre-training stage for the adversarial network

./experiments/scripts/fast_rcnn_adv_pretrain.sh  [GPU_ID]  VGG16 pascal_voc

stage3:copy the weights of the above two models to initialize the joint model

./copy_model.h

stage4: joint training of the detector and the adversarial network

./experiments/scripts/fast_rcnn_adv.sh  [GPU_ID]  VGG16 pascal_voc

三、代码解析

1.sigmod交叉熵

  • 交叉熵化简:
    交叉熵化简
    交叉熵化简
  • 进一步可化简为:
    交叉熵化简
    交叉熵化简
  • 对应本文中的代码是: adversarial-frcnn/lib/roi_data_layer/layer.py
    代码1
  • 代码解析:
1.注意绝对值使用的巧妙之处:

** lZ = np.log(1+np.exp(-np.abs(f))) * mask Lz对应化简公式的第二项,其中e的指数项x在两种情况下,均为非正,可以概括为代码中np.exp(-np.abs(f))

2.注意判断语句使用的巧妙之处:

* ((f>0)-t)f * mask该项对应化简公式的第一项,对应caffe源码为:
代码2

四、参考链接

  • caffe网络可视化工具:http://ethereon.github.io/netscope/#/editor
  • 交叉熵公式推导:http://caffecn.cn/?/question/25
  • 交叉熵公式说明:http://blog.csdn.net/u014114990/article/details/47975739
  • 论文代码:https://github.com/xiaolonw/adversarial-frcnn

关于作者 Edited by fangfang xiuhong

Click to read and post comments

7月 16, 2017

Object detection in CVPR2017

cvpr2017

detector

  1. Accurate Single Stage Detector Using Recurrent Rolling Convolution paper
  2. Training Object Class Detectors With Click Supervision paper
  3. Self-Learning Scene-Specific Pedestrian Detectors Using a Progressive Latent Model paper
  4. EAST: An Efficient and Accurate Scene Text Detector paper
  5. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors
  6. What Is and What Is Not a Salient Object? Learning Salient Object Detector by Ensembling Linear Exemplar Regressors
  7. Expecting the Unexpected: Training Detectors for Unusual Pedestrians With Adversarial Imposters
  8. Learning Discriminative and Transformation Covariant Local Feature Detectors

detection

  1. SRN: Side-output Residual Network for Object Symmetry Detection in the Wild
  2. Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes From 2D Ones in RGB-Depth Images
  3. Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection
  4. Deep Level Sets for Salient Object Detection
  5. Spatially-Varying Blur Detection Based on Multiscale Fused and Sorted Transform Coefficients of Gradient Magnitudes
  6. Object Detection in Videos With Tubelet Proposal Networks
  7. Feature Pyramid Networks for Object Detection
  8. Fast Boosting Based Detection Using Scale Invariant Multimodal Multiresolution Filtered Features
  9. Temporal Convolutional Networks for Action Segmentation and Detection
  10. Discriminative Bimodal Networks for Visual Localization and Detection With Natural Language Queries
  11. Interspecies Knowledge Transfer for Facial Keypoint Detection
  12. Deep Joint Rain Detection and Removal From a Single Image
  13. CASENet: Deep Category-Aware Semantic Edge Detection
  14. Image Splicing Detection via Camera Response Function Analysis
  15. Scale-Aware Face Detection
  16. Perceptual Generative Adversarial Networks for Small Object Detection
  17. Predictive-Corrective Networks for Action Detection (project, abstract, PDF)
  18. Unified Embedding and Metric Learning for Zero-Exemplar Event Detection
  19. A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection (PDF)
  20. Multiple Instance Detection Network With Online Instance Classifier Refinement
  21. Visual Translation Embedding Network for Visual Relation Detection
  22. SCC: Semantic Context Cascade for Efficient Action Detection
  23. End-To-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering
  24. Joint Detection and Identification Feature Learning for Person Search
  25. Deep Matching Prior Network: Toward Tighter Multi-Oriented Text Detection
  26. Visual-Inertial-Semantic Scene Representation for 3D Object Detection
  27. A Deep Regression Architecture With Two-Stage Re-Initialization for High Performance Facial Landmark Detection
  28. Quad-Networks: Unsupervised Learning to Rank for Interest Point Detection
  29. Polyhedral Conic Classifiers for Visual Object Detection and Classification
  30. Incremental Kernel Null Space Discriminant Analysis for Novelty Detection
  31. Straight to Shapes: Real-Time Detection of Encoded Shapes
  32. Learning Cross-Modal Deep Representations for Robust Pedestrian Detection
  33. Spatio-Temporal Self-Organizing Map Deep Network for Dynamic Object Detection From Videos
  34. Provable Self-Representation Based Outlier Detection in a Union of Subspaces
  35. Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection
  36. CityPersons: A Diverse Dataset for Pedestrian Detection
  37. Hand Keypoint Detection in Single Images Using Multiview Bootstrapping
  38. Minimum Delay Moving Object Detection
  39. Weakly Supervised Affordance Detection
  40. RON: Reverse Connection With Objectness Prior Networks for Object Detection
  41. Deeply Supervised Salient Object Detection With Short Connections
  42. Simultaneous Facial Landmark Detection, Pose and Deformation Estimation Under Facial Occlusion
  43. Joint Gap Detection and Inpainting of Line Drawings
  44. MCMLSD: A Dynamic Programming Approach to Line Segment Detection
  45. Richer Convolutional Features for Edge Detection
  46. What Can Help Pedestrian Detection?
  47. UntrimmedNets for Weakly Supervised Action Recognition and Detection
  48. Multi-View 3D Object Detection Network for Autonomous Driving
  49. Non-Local Deep Features for Salient Object Detection
  50. Unsupervised Vanishing Point Detection and Camera Calibration From a Single Manhattan Image With Radial Distortion
  51. Action Unit Detection With Region Adaptation, Multi-Labeling Learning and Optimal Temporal Fusing
  52. Mimicking Very Efficient Network for Object Detection
  53. Learning Detection With Diverse Proposals
  54. YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video
Click to read and post comments

7月 01, 2017

People in object detection

Kaiming He

arxiv paper list :white_check_mark:

  1. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition paper code
  2. Convolutional Neural Networks at Constrained Time Cost paper
  3. Efficient and Accurate Approximations of Nonlinear Convolutional Networks paper
  4. ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation paper
  5. Instance-aware Semantic Segmentation via Multi-task Network Cascades paper
  6. Deep Residual Learning for Image Recognition paper
  7. Identity Mappings in Deep Residual Networks paper
  8. Instance-sensitive Fully Convolutional Networks paper
  9. Is Faster R-CNN Doing Well for Pedestrian Detection? paper
  10. R-FCN: Object Detection via Region-based Fully Convolutional Networks paper
  11. Aggregated Residual Transformations for Deep Neural Networks paper
  12. Feature Pyramid Networks for Object Detection paper
  13. Mask R-CNN paper
  14. Detecting and Recognizing Human-Object Interactions paper

Ross Girshick

arxiv paper list :white_check_mark:

Trevor Darrell

arxiv paper list :white_check_mark:

Rogerio Feris

  1. S3Pool: Pooling with Stochastic Spatial Sampling paper code
  2. Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes paper
  3. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection paper caffe
  4. Shape Classification Through Structured Learning of Matching Measures paper
  5. Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance paper
  6. Boosting Object Detection Performance in Crowded Surveillance Videos paper
  7. Efficient Maximum Appearance Search for Large-Scale Object Detection paper
  8. Fast Face Detector Training Using Tailored Views paper
  9. Attribute-based People Search: Lessons Learnt from a Practical Surveillance System paper
  10. Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification paper
  11. [BOOK] Visual Attributes link introduce

Piotr Dollar

blog

  1. Supervised Learning of Edges and Object Boundaries paper
  2. Multiple Component Learning for Object Detection paper
  3. Fast Feature Pyramids for Object Detection paper
  4. Detecting Objects using Deformation Dictionaries paper
  5. Edge Boxes: Locating Object Proposals from Edges paper
  6. What makes for effective detection proposals? paper
  7. Learning to Segment Object Candidates paper
  8. Semantic Amodal Segmentation paper
  9. Unsupervised Learning of Edges paper
  10. A MultiPath Network for Object Detection paper
  11. Learning to Refine Object Segments paper

Xiaoyu Wang

  1. Regionlets for Generic Object Detection ICCV 2013 T-PAMI 2015
  2. Generic Object Detection with Dense Neural Patterns and Regionlets paper
  3. Accurate Object Detection with Location Relaxation and Regionlets Relocalization paper
  4. Deep Reinforcement Learning-based Image Captioning with Embedding Reward paper
  5. SEP-Nets: Small and Effective Pattern Networks paper

Rodrigo Benenson

  1. Traffic Sign Recognition – How far are we from the solution paper
  2. Seeking the strongest rigid detector paper
  3. How good are detection proposals, really? paper
  4. Ten Years of Pedestrian Detection, What Have We Learned? paper
  5. Taking a Deeper Look at Pedestrians paper
  6. Filtered Channel Features for Pedestrian Detection paper
  7. What makes for effective detection proposals? paper
  8. What is Holding Back Convnets for Detection? paper
  9. Weakly Supervised Object Boundaries paper
  10. How Far are We from Solving Pedestrian Detection? paper
  11. The Cityscapes Dataset paper
  12. Detecting Surgical Tools by Modelling Local Appearance and Global Shape paper

Jan Hosang

  1. A convnet for non-maximum suppression paper
  2. Simple does it: Weakly supervised instance and semantic segmentation paper
  3. Learning non-maximum suppression paper

Workshop

  1. ICCV 2015 Tutorial on Tools for Efficient Object Detection
Click to read and post comments

4月 14, 2017

Reading List

Object detection

  1. Rich feature hierarchies for accurate object detection and semantic segmentation paper
  2. Fast R-CNN paper
  3. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks paper
  4. [read]R-FCN: Object Detection via Region-based Fully Convolutional Networks paper
  5. [read]Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks paper
  6. Feature Pyramid Networks for Object Detection paper
  7. [read] A-Fast-RCNN: Hard positive generation via adversary for object detection paper github
  8. [read] Generative Adversarial Networks paper
  9. [read] Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization paper caffe
  10. [read] Spatial Memory for Context Reasoning in Object Detection paper
  11. Accurate Single Stage Detector Using Recurrent Rolling Convolution paper
  12. ME R-CNN: Multi-Expert Region-based CNN for Object Detection paper
  13. [read] Beyond Skip Connections: Top-Down Modulation for Object Detection paper
  14. Improving Object Detection With One Line of Code paper
  15. S-OHEM: Stratified Online Hard Example Mining for Object Detection paper
  16. Adaptive Object Detection Using Adjacency and Zoom Prediction paper caffe
  17. You Only Look Once: Unified, Real-Time Object Detection paper
  18. YOLO9000: Better, Faster, Stronger paper
  19. Deformable Convolutional Networks paper mxnet
  20. Learning Detection with Diverse Proposals paper caffe
  21. Feature Pyramid Networks for Object Detection paper
  22. [read] RON: Reverse Connection with Objectness Prior Networks for Object Detection paper

Text detection

  1. Detecting Text in Natural Image with Connectionist Text Proposal Network paper
  2. EAST: An Efficient and Accurate Scene Text Detector paper

Semantic Image Segmentation

  1. Fully Convolutional Networks for Semantic Segmentation paper caffe
  2. Semantic Image Sementation with Deep Convolutional Nets and Fully Connected CRF paper
  3. Conditional Random Fields as Recurrent Neural Networks paper caffe
  4. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs paper pytorch
  5. [read] Fully Convolutional Instance-aware Semantic Segmentation paper mxnet
  6. Loss Max-Pooling for Semantic Image Segmentation paper
  7. [read] Mask R-CNN paper tf

Recognition and Detection in 3D

  1. 3D ShapeNets: A Deep Representation for Volumetric Shapes paper matlab
  2. VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition paper Lasagne

Visual Reasoning

  1. Inferring and Executing Programs for Visual Reasoning paper pytorch

Human motion

  1. Unsupervised Learning of Depth and Ego-Motion from Video paper github

CNN and its property

  1. Group Invariant Scattering paper
  2. Invariant Scattering Convolution Networks paper
  3. Structured Receptive Fields in CNNs paper
  4. Dynamic Filter Networks paper
  5. Multiscale Hierarchical Convolutional Networks paper
  6. Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors paper

Image classification

  1. Deep Residual Learning for Image Recognition paper

Lightweight CNN

  1. Towards lightweight convolutional neural networks for object detection paper
Click to read and post comments