2013年6月11日 星期二

[ammai] week7 Support Vector Learning for Ordinal Regression

Given a set of samples S = {(xi,yi)}, where xi is the feature vector of sample and yi is the label, most machine learning problem can be classified into one of the three categories, depending on the property of label:
  1. Classification: yi belongs to a finite set of discrete value, where the value is unordered.
  2. Regression: yi is in a continues metric space, where the value of yi contains relevancy information about the sample.
  3. Ordinal regression: yi belongs to a finite set of discrete value like classification, but there exist ordering relationship in the value as the regression. It can be interpreted as a ranking problem.
This paper gives a formal formulation and solution of the ordinal regression problem.

Given a sample set S = (Xi,Yi), the regression problem aims to find the best hypothesis that minimize the risk function R(h). The common expectation of lost function E[L()] is used here as the risk function.

Since only the order matters, the lost function L(y1, y2, y1', y2') gives 1 when the order is incorrect, and gives 0 otherwise. We can then redefine the sample set S' using pair of original variable: S' = {(x1, x2), sign(y1 - y2)}. 


When using this expression, the output value is in the set {-1, 0, +1}, which transfer the ordinal regression problem into a classification problem.


On the other hand, one can also use a real number function to express the ordinal hypothesis. For each hypothesis h, we can use a function U that


The U function and the theta threshold can then be solved by maximizing the boundary of the training data. The procedure is then the same as the famous Support Vector Machine, which needs to solve a standard QP-problem. 



For any unseen data list, we calculate the pairwise feature differences and multiply by w. Then, we are able to get the pairwise relationships of the data.

Although the algorithm is simple and easy-understanding, the pairwise computation is expensive. Thus, the scalability issue is major concerned for the future work.

沒有留言:

張貼留言