RANSAC

Outliers (noises) in the data can diverge the regression model to reduce prediction errors for them, instead of the majority real data points. RANdom SAmple Consensus is a methodology to robustly fit the model in the presence of outliers.

RANSAC does the following:

  • randomly sample a subset of data of an fairly enough amount for training;
  • fit a model to the this subset;
  • determine data points in the whole data set as inliers or outliers by comparing the residuals (prediction errors) to a threshold. The set of inliers is called a consensus set;
  • repeat above for some iterations and retrain the final model with the largest consensus set (since inliers should be the majority).

Parameters of RANSAC include:

  • \(s\): number of points to fit the model;

  • \(t\): threshold of the residual;

  • \(e\): proportion the outliers;

  • \(\delta\): probability of success (at least one iteration is finished with no outlier);

  • \(T\): number of iterations to be determined.

Then,

  • \(p\text{(training subset has no outliers)} = (1 - e)^s\)
  • \(p\text{(training subset has at least one outlier)} = 1 - (1 - e)^s\)
  • \(p\text{(all T subsets have outliers)} = (1 - (1 - e)^s)^T\)

We want \[ \begin{gather} p\text{(all T subsets have outliers)} = (1 - (1 - e)^s)^T < 1 - \delta \\ T > \log\frac{1 - \delta}{1 - (1 - e)^s} \end{gather} \] The threshold \(t\) is usually set as the median absolute deviation of \(y\).

External Material

随机抽样一致算法(Random sample consensus,RANSAC) - 桂。 - 博客园 (cnblogs.com)

Previous
Next