Computer Vision Reading Group

7月 18, 2017

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

一、论文思想

训练一个目标检测器，对遮挡和形变鲁棒，目前的主要方法是增加不同场景下的图像数据，但这些数据有时又特别少。作者提出使用对抗生成有遮挡或形变的样本，这些样本对检测器来说识别比较困难，使用这些困难的正样本训练可以增加检测器的鲁棒性。使用对抗网络生成有遮挡和有形变的两种特征，分别对应网络ASDN和ASTN。使用对抗网络生成有遮挡和有形变的两种特征，分别对应网络ASDN和ASTN。

1.ASDN

FAST-RCNN中RoI-池化层之后的每个目标proposal卷积特征作为对抗网络的输入，给定一个目标的特征，ASDN尝试生成特征某些部分被dropout的掩码，导致检测器无法识别该物体。
ASDN网络初始化: 给定尺寸大小为d×d的特征图X，使用d3×d3的滑动窗，并将滑动窗位置映射到原图，将原图对应位置清零，生成新的特征向量，传入到分类层计算损失，选择具有最大损失的滑动窗，用这个窗口生成二值掩码M(滑动窗位置为1，其余位置为0)，用n个目标proposal生成n对对抗网络的训练样本（x1,M1）,...,(xn,Mn) ，使用二值交叉熵损失训练ASDN:

cross_extropy

在前向传播过程中，首先使用ASDN在RoI-池化层之后生成特征掩码，然后使用重要性采样法生成二值掩码，使用该掩码将特征对应部位值清零，修改后的特征继续前向传播计算损失。这个过程生成了困难的特征，用于训练检测器。训练过程流程图如下所示：

ASDN

2.ASTN

STN网络包含三部分：定位网络，网格生成器，采样器。定位网络估计出形变的参数（旋转角度、平移距离和缩放因子）。这三个参数作为后两部分的输入，输出是形变后的特征图。论文主要学习定位网络的三个参数。
ASTN: 主要关注特征旋转，定位网络包含三层全连接层，前两层是ImageNet预训练的fc6和fc7，训练过程与ASDN类似，ASTN对特征进行形变，使得ASTN将正样本识别成负样本。将特征图划分为4个block，每个block估计四个方向的旋转，增加了任务的复杂度。
两种对抗网络可以相结合，使得检测器更鲁棒，RoI-池化层提取的特征首先传入ASDN丢弃一些激活，之后使用ASTN对特征进行形变，如下图所示：
ASDN 与 ASTN 网络组合架构示意。首先创建遮挡蒙版，随后旋转路径以产生用于训练的例子。

二、训练

stage1：training a standard Fast-RCNN

./experiments/scripts/fast_rcnn_std.sh  [GPU_ID]  VGG16 pascal_voc

stage2:pre-training stage for the adversarial network

./experiments/scripts/fast_rcnn_adv_pretrain.sh  [GPU_ID]  VGG16 pascal_voc

stage3:copy the weights of the above two models to initialize the joint model

./copy_model.h

stage4: joint training of the detector and the adversarial network

./experiments/scripts/fast_rcnn_adv.sh  [GPU_ID]  VGG16 pascal_voc

三、代码解析

1.sigmod交叉熵

交叉熵化简：
进一步可化简为：
对应本文中的代码是： adversarial-frcnn/lib/roi_data_layer/layer.py
代码解析：

1.注意绝对值使用的巧妙之处：

** lZ = np.log(1+np.exp(-np.abs(f))) * mask Lz对应化简公式的第二项，其中e的指数项x在两种情况下，均为非正，可以概括为代码中np.exp(-np.abs(f))

2.注意判断语句使用的巧妙之处：

* ((f>0)-t)f * mask该项对应化简公式的第一项，对应caffe源码为:
代码2

四、参考链接

caffe网络可视化工具：http://ethereon.github.io/netscope/#/editor
交叉熵公式推导：http://caffecn.cn/?/question/25
交叉熵公式说明：http://blog.csdn.net/u014114990/article/details/47975739
论文代码：https://github.com/xiaolonw/adversarial-frcnn

关于作者 Edited by fangfang xiuhong

posted at 19:54 · Paper · detection adversarial

Click to read and post comments

7月 16, 2017

Object detection in CVPR2017

cvpr2017

detector

Accurate Single Stage Detector Using Recurrent Rolling Convolution paper
Training Object Class Detectors With Click Supervision paper
Self-Learning Scene-Specific Pedestrian Detectors Using a Progressive Latent Model paper
EAST: An Efficient and Accurate Scene Text Detector paper
Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors
What Is and What Is Not a Salient Object? Learning Salient Object Detector by Ensembling Linear Exemplar Regressors
Expecting the Unexpected: Training Detectors for Unusual Pedestrians With Adversarial Imposters
Learning Discriminative and Transformation Covariant Local Feature Detectors

detection

SRN: Side-output Residual Network for Object Symmetry Detection in the Wild
Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes From 2D Ones in RGB-Depth Images
Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection
Deep Level Sets for Salient Object Detection
Spatially-Varying Blur Detection Based on Multiscale Fused and Sorted Transform Coefficients of Gradient Magnitudes
Object Detection in Videos With Tubelet Proposal Networks
Feature Pyramid Networks for Object Detection
Fast Boosting Based Detection Using Scale Invariant Multimodal Multiresolution Filtered Features
Temporal Convolutional Networks for Action Segmentation and Detection
Discriminative Bimodal Networks for Visual Localization and Detection With Natural Language Queries
Interspecies Knowledge Transfer for Facial Keypoint Detection
Deep Joint Rain Detection and Removal From a Single Image
CASENet: Deep Category-Aware Semantic Edge Detection
Image Splicing Detection via Camera Response Function Analysis
Scale-Aware Face Detection
Perceptual Generative Adversarial Networks for Small Object Detection
Predictive-Corrective Networks for Action Detection (project, abstract, PDF)
Unified Embedding and Metric Learning for Zero-Exemplar Event Detection
A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection (PDF)
Multiple Instance Detection Network With Online Instance Classifier Refinement
Visual Translation Embedding Network for Visual Relation Detection
SCC: Semantic Context Cascade for Efficient Action Detection
End-To-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering
Joint Detection and Identification Feature Learning for Person Search
Deep Matching Prior Network: Toward Tighter Multi-Oriented Text Detection
Visual-Inertial-Semantic Scene Representation for 3D Object Detection
A Deep Regression Architecture With Two-Stage Re-Initialization for High Performance Facial Landmark Detection
Quad-Networks: Unsupervised Learning to Rank for Interest Point Detection
Polyhedral Conic Classifiers for Visual Object Detection and Classification
Incremental Kernel Null Space Discriminant Analysis for Novelty Detection
Straight to Shapes: Real-Time Detection of Encoded Shapes
Learning Cross-Modal Deep Representations for Robust Pedestrian Detection
Spatio-Temporal Self-Organizing Map Deep Network for Dynamic Object Detection From Videos
Provable Self-Representation Based Outlier Detection in a Union of Subspaces
Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection
CityPersons: A Diverse Dataset for Pedestrian Detection
Hand Keypoint Detection in Single Images Using Multiview Bootstrapping
Minimum Delay Moving Object Detection
Weakly Supervised Affordance Detection
RON: Reverse Connection With Objectness Prior Networks for Object Detection
Deeply Supervised Salient Object Detection With Short Connections
Simultaneous Facial Landmark Detection, Pose and Deformation Estimation Under Facial Occlusion
Joint Gap Detection and Inpainting of Line Drawings
MCMLSD: A Dynamic Programming Approach to Line Segment Detection
Richer Convolutional Features for Edge Detection
What Can Help Pedestrian Detection?
UntrimmedNets for Weakly Supervised Action Recognition and Detection
Multi-View 3D Object Detection Network for Autonomous Driving
Non-Local Deep Features for Salient Object Detection
Unsupervised Vanishing Point Detection and Camera Calibration From a Single Manhattan Image With Radial Distortion
Action Unit Detection With Region Adaptation, Multi-Labeling Learning and Optimal Temporal Fusing
Mimicking Very Efficient Network for Object Detection
Learning Detection With Diverse Proposals
YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video

posted at 07:44 · List · detection cvpr

Click to read and post comments

7月 01, 2017

People in object detection

Kaiming He

arxiv paper list :white_check_mark:

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition paper code
Convolutional Neural Networks at Constrained Time Cost paper
Efficient and Accurate Approximations of Nonlinear Convolutional Networks paper
ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation paper
Instance-aware Semantic Segmentation via Multi-task Network Cascades paper
Deep Residual Learning for Image Recognition paper
Identity Mappings in Deep Residual Networks paper
Instance-sensitive Fully Convolutional Networks paper
Is Faster R-CNN Doing Well for Pedestrian Detection? paper
R-FCN: Object Detection via Region-based Fully Convolutional Networks paper
Aggregated Residual Transformations for Deep Neural Networks paper
Feature Pyramid Networks for Object Detection paper
Mask R-CNN paper
Detecting and Recognizing Human-Object Interactions paper

Ross Girshick

arxiv paper list :white_check_mark:

Trevor Darrell

arxiv paper list :white_check_mark:

Rogerio Feris

S3Pool: Pooling with Stochastic Spatial Sampling paper code
Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes paper
A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection paper caffe
Shape Classification Through Structured Learning of Matching Measures paper
Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance paper
Boosting Object Detection Performance in Crowded Surveillance Videos paper
Efficient Maximum Appearance Search for Large-Scale Object Detection paper
Fast Face Detector Training Using Tailored Views paper
Attribute-based People Search: Lessons Learnt from a Practical Surveillance System paper
Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification paper
[BOOK] Visual Attributes link introduce

Piotr Dollar

blog

Supervised Learning of Edges and Object Boundaries paper
Multiple Component Learning for Object Detection paper
Fast Feature Pyramids for Object Detection paper
Detecting Objects using Deformation Dictionaries paper
Edge Boxes: Locating Object Proposals from Edges paper
What makes for effective detection proposals? paper
Learning to Segment Object Candidates paper
Semantic Amodal Segmentation paper
Unsupervised Learning of Edges paper
A MultiPath Network for Object Detection paper
Learning to Refine Object Segments paper

Xiaoyu Wang

Regionlets for Generic Object Detection ICCV 2013 T-PAMI 2015
Generic Object Detection with Dense Neural Patterns and Regionlets paper
Accurate Object Detection with Location Relaxation and Regionlets Relocalization paper
Deep Reinforcement Learning-based Image Captioning with Embedding Reward paper
SEP-Nets: Small and Effective Pattern Networks paper

Rodrigo Benenson

Traffic Sign Recognition – How far are we from the solution paper
Seeking the strongest rigid detector paper
How good are detection proposals, really? paper
Ten Years of Pedestrian Detection, What Have We Learned? paper
Taking a Deeper Look at Pedestrians paper
Filtered Channel Features for Pedestrian Detection paper
What makes for effective detection proposals? paper
What is Holding Back Convnets for Detection? paper
Weakly Supervised Object Boundaries paper
How Far are We from Solving Pedestrian Detection? paper
The Cityscapes Dataset paper
Detecting Surgical Tools by Modelling Local Appearance and Global Shape paper

Jan Hosang

A convnet for non-maximum suppression paper
Simple does it: Weakly supervised instance and semantic segmentation paper
Learning non-maximum suppression paper

Workshop

ICCV 2015 Tutorial on Tools for Efficient Object Detection

posted at 14:40 · list · detection people

Click to read and post comments

4月 14, 2017

Reading List

Object detection

Rich feature hierarchies for accurate object detection and semantic segmentation paper
Fast R-CNN paper
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks paper
[read]R-FCN: Object Detection via Region-based Fully Convolutional Networks paper
[read]Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks paper
Feature Pyramid Networks for Object Detection paper
[read] A-Fast-RCNN: Hard positive generation via adversary for object detection paper github
[read] Generative Adversarial Networks paper
[read] Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization paper caffe
[read] Spatial Memory for Context Reasoning in Object Detection paper
Accurate Single Stage Detector Using Recurrent Rolling Convolution paper
ME R-CNN: Multi-Expert Region-based CNN for Object Detection paper
[read] Beyond Skip Connections: Top-Down Modulation for Object Detection paper
Improving Object Detection With One Line of Code paper
S-OHEM: Stratified Online Hard Example Mining for Object Detection paper
Adaptive Object Detection Using Adjacency and Zoom Prediction paper caffe
You Only Look Once: Unified, Real-Time Object Detection paper
YOLO9000: Better, Faster, Stronger paper
Deformable Convolutional Networks paper mxnet
Learning Detection with Diverse Proposals paper caffe
Feature Pyramid Networks for Object Detection paper
[read] RON: Reverse Connection with Objectness Prior Networks for Object Detection paper

Text detection

Detecting Text in Natural Image with Connectionist Text Proposal Network paper
EAST: An Efficient and Accurate Scene Text Detector paper

Semantic Image Segmentation

Fully Convolutional Networks for Semantic Segmentation paper caffe
Semantic Image Sementation with Deep Convolutional Nets and Fully Connected CRF paper
Conditional Random Fields as Recurrent Neural Networks paper caffe
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs paper pytorch
[read] Fully Convolutional Instance-aware Semantic Segmentation paper mxnet
Loss Max-Pooling for Semantic Image Segmentation paper
[read] Mask R-CNN paper tf

Recognition and Detection in 3D

3D ShapeNets: A Deep Representation for Volumetric Shapes paper matlab
VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition paper Lasagne

Visual Reasoning

Inferring and Executing Programs for Visual Reasoning paper pytorch

Human motion

Unsupervised Learning of Depth and Ego-Motion from Video paper github

CNN and its property

Group Invariant Scattering paper
Invariant Scattering Convolution Networks paper
Structured Receptive Fields in CNNs paper
Dynamic Filter Networks paper
Multiscale Hierarchical Convolutional Networks paper
Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors paper

Image classification

Deep Residual Learning for Image Recognition paper

Lightweight CNN

Towards lightweight convolutional neural networks for object detection paper

posted at 14:40 · List · detection segmentation lightweight

Click to read and post comments