Paper: “Aggregating local descriptors into a compact image representation,” Herve Jegou, et al., Proc. IEEE CVPR’10.
Note:
This paper proposed a system for very-large-scale-image search dealing with the accuracy, efficiency and the memory it used.
The mainly two contributions are
1. Propose a method to represent the image with excellent accuracy and reasonable vector dimensionality.
2. Optimizing the trade-off between the indexation algorithm and the dimensional reduction.
After generating the SIFT, not to use millions of centroids to build the dictionary in the Bag of Words, but use a few centroid (256-centroid) to build the descriptor.
Then, we still assign each feature to one of the centroid. After assignment, we don’t count the number of features belonging to that centroid as in the BOW; instead, we accumulate the difference between centroids and features belonging to it. So, each image can be represented by a 128 x k-dimension (number of centroids) vector rather than a large-dimension vector.
After generating vectors, the paper propose a method to compress it. With the desire to achieve better accuracy after quantization, the number k of the cluster should be large enough; however, it is intractable.
So, in this work, it uses a product quantization method. First, divide the feature vector into m equal part. Then, we need m quantizers for each sub-vectors. The good news is that, for now, we only need ks = k^(1/m) centroids for each quantizer to reach expected accuracy.
In addition to quantization, it’s better for us to do dimension reduction. We may like to adopt the traditional PCA. However, there are 2 problems arise from PCA.
1.The variance of each dimension is different after PCA, so the first principal component will be the bottleneck of the quantization error.
2.There is a trade-off between projection error and quantization error.
Comments:
1. The way it quantize the feature is 'product'. I think it is a impressive way to do such things, but it's a little bit complicated.
2.Using original value of distance rather than the absolute value is another surprising things. It might be for the author want to remain the vector's direction in the dimensions.
沒有留言:
張貼留言