RANSAC

Outliers (noises) in the data can diverge the regression model to reduce prediction errors for them, instead of the majority real data points. RANdom SAmple Consensus is a methodology to robustly fit the model in the presence of outliers.

RANSAC does the following:

randomly sample a subset of data of an fairly enough amount for training;
fit a model to the this subset;
determine data points in the whole data set as inliers or outliers by comparing the residuals (prediction errors) to a threshold. The set of inliers is called a consensus set;
repeat above for some iterations and retrain the final model with the largest consensus set (since inliers should be the majority).

Parameters of RANSAC include:

\(s\): number of points to fit the model;
\(t\): threshold of the residual;
\(e\): proportion the outliers;
\(\delta\): probability of success (at least one iteration is finished with no outlier);
\(T\): number of iterations to be determined.

Then,

\(p\text{(training subset has no outliers)} = (1 - e)^s\)
\(p\text{(training subset has at least one outlier)} = 1 - (1 - e)^s\)
\(p\text{(all T subsets have outliers)} = (1 - (1 - e)^s)^T\)

We want \[ \begin{gather} p\text{(all T subsets have outliers)} = (1 - (1 - e)^s)^T < 1 - \delta \\ T > \log\frac{1 - \delta}{1 - (1 - e)^s} \end{gather} \] The threshold \(t\) is usually set as the median absolute deviation of \(y\).

External Material

随机抽样一致算法（Random sample consensus，RANSAC） - 桂。 - 博客园 (cnblogs.com)

Last updated on Apr 20, 2022