Vignette: Basic Robust Estimators

Outline

In this vignette, we discuss the usage of some basic robust estimators of location (and scale). The estimators are organized by three categories estimating methods, population characteristics and modes.

Estimating methods

Trimming (Chap. 2)
Winsorization (Chap. 3)
Weight reduction (Chap. 4)
M-estimation (Chap. 5)

Population characteristics

mean
total
(standard deviation \(\rightarrow\) see utility functions, Chap. 6)

Modes

bare-bone methods
survey methods

Bare-bone methods are stripped-down versions of the survey methods in terms of functionality and informativeness. These functions may serve users and other package developers as building blocks. In particular, bare-bone functions cannot compute variances. The survey methods are much more capable and depend—for variance estimation—on the R package survey Lumley (2021, 2010).

1 LOS (Length-of-stay) Hospital Data

The losdata are a simple random sample without replacement of \(n = 70\) patients from the (fictive) population of \(N = 2\,479\) patients in inpatient hospital treatment.^{Note 2} First, we load the package and the data

> library(robsurvey, quietly = TRUE)
> data(losdata)
> attach(losdata)

The first 3 rows of the data are

> head(losdata, 3)
  los   weight  fpc
1  10 34.91549 2479
2   7 34.91549 2479
3  21 34.91549 2479

We have data on the following variables

los length-of-stay in hospital (days)
weight sampling weight
fpc finite population correction

We consider estimating average length-of-stay in hospital (LOS, days).

1.1 Survey design object

For the survey methods (not bare-bone methods), we must load the survey package (Lumley, 2010, 2021)

> library(survey)

and specify a survey.design object

> dn <- svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata)

1.2 Exploring the data

The distribution of the variable los is skewed to the right (see boxplot), and we see a couple of rather heavy outliers. On the logarithmic scale, the distribution is slightly skewed to the right. The outliers need not be errors. Following Chambers (1986), we distinguish representative outliers from non-representative outliers.

The outliers visible in the boxplot refer to a few individuals who stayed for a long time in inpatient care. Moreover, we assume that these outliers represent patients in the population that are similar in value (i.e., representative outliers).

Although the outliers are not some kind of error, it is beneficiary (from the perspective of efficiency) not consider estimating the population average of LOS by the weighted mean (Hajek estimator). The influence that the outliers exert on the weighted mean and its variance estimator can lead to inefficiencies. Instead, we shall “treat” the outliers (e.g., downweight) to limit their influence on the estimators. As a result, we may hope to obtain more efficient estimates.

2 Trimming

2.1 Bare-bone methods

The following estimation methods are available:

weighted_mean_trimmed()
weighted_total_trimmed()

We consider estimating the 5 % one-sided trimmed population mean of LOS. The lower end of the distribution is not trimmed (lower bound: LB = 0). The 5% largest observations are trimmed (upper bound: UB = 0.95). The range of values to be considered for estimation is thus defined as \([0, 0.95]\)

> weighted_mean_trimmed(los, weight, LB = 0, UB = 0.95)
[1] 9.323529

We obtain an estimate of (roughly) 9.3 days.

2.2 Survey methods

The following estimation methods are available:

svymean_trimmed()
svytotal_trimmed()

As before, we are interested in computing the 5% one-sided trimmed population mean. In contrast to weighted_mean_trimmed(), the method svymean_trimmed() computes the standard error of the estimate using the functionality of the survey package.

> m <- svymean_trimmed(~los, dn, LB = 0, UB = 0.95)
> m
     mean    SE
los 9.324 1.064

The estimated location, variance, and standard error of the estimator can be extracted from object m with the following commands.

> coef(m)
     los 
9.323529 
> vcov(m)
    Variance
los 1.131988
> SE(m)
[1] 1.063949

The summary() method summarizes the most important facts about the estimate. [In contrast to \(M\)-estimators (see below), the summary is not very interesting here]

> summary(m)
Weighted trimmed estimator (0, 0.95) of the sample mean 

     mean    SE
los 9.324 1.064


Sampling design:
Independent Sampling design
svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata)

Additional utility functions are

residuals() to extract the residuals. [For the weighted trimmed mean, the residuals are not so interesting].
fitted() to extract the fitted values under the model in use. [Relevant for \(M\)-estimators].
robweights() to extract the robustness weights. [Relevant for \(M\)-estimators.]

3 Winsorization

3.1 Bare-bone methods

The following estimation methods are available:

weighted_mean_winsorized()
weighted_mean_k_winsorized()
weighted_total_winsorized()
weighted_total_k_winsorized()

We consider estimating the one-sided winsorized population mean of LOS. The lower end of the distribution is not winsorized (lower bound: LB = 0). The 5% largest observations are winsorized (upper bound: UB = 0.95).

> weighted_mean_winsorized(los, weight, LB = 0, UB = 0.95)
[1] 10.40845

There is another variant of the winsorized mean (and total) available, which is specified in terms of the number of \(k=1, 2, ...\) observations to be winsorized in the right tail of the distribution. It is called the one-sided k-winsorized mean (and total) and is computed as follows

> weighted_mean_k_winsorized(los, weight, k = 1)
[1] 11.40845

3.2 Survey methods

The following estimation methods are available:

svymean_winsorized()
svymean_k_winsorized()
svytotal_winsorized()
svytotal_k_winsorized()

> svymean_winsorized(~los, dn, LB = 0, UB = 0.95)
     mean    SE
los 10.41 1.212

The utility functions coef(), vcov(), SE(), summary(), residuals(), fitted(), and robweights() are available.

Good to know.

For the survey methods with postfix _winsorized, the implementation offers two variance estimation techniques.

simple_var = FALSE: the “standard” variance estimation technique for the winsorized mean (and total). It depends on a kernel-based estimate of the density function, which is evaluated at the winsorization quantiles. Under circumstances, this estimate can be difficult to compute and/ or unreliable.
simple_var = TRUE: a simplified variance estimation technique, which is based on the variance of the weighted trimmed mean (or total).

4 Weight Reduction Methods

Winsorization and trimming act directly on the values of an estimator. Other estimation methods reduce the sampling weight of potential outliers instead. A hybrid method of winsorization and weight reduction to treat influential observations has been proposed by Dalén (1987). An observation \(y_i\) is called influential if its expanded value, \(w_iy_i\), is exceedingly large. Let \(c>0\) denote a winsorization or censoring cutoff value. Dalén’s estimator Z2 and Z3 of the population \(y\)-total are given by \(\sum_{i \in s} [w_i y_i]_{\circ}^c\), where \(\circ\) is a placeholder for 2 or 3 and \[ \begin{align*} [w_i y_i]_2^c = \begin{cases} w_i y_i & \text{if} \quad w_i y_i \leq c, \\ c & \text{otherwise}, \end{cases} &\qquad \text{and} \qquad [w_i y_i]_3^c = \begin{cases} w_i y_i & \text{if} \quad w_i y_ \leq c, \\ c + (y_i - c/w_i) & \text{otherwise}. \end{cases} \end{align*} \]

Estimator Z2 censors the terms \(w_iy_i\) at \(c\). In estimator Z3, observations \(y_i\) such that \(w_iy_i > c\) contribute to the estimated total only with \(c\) plus the excess over the cutoff, \((w_iy_i - c)\). Note that the excess over the threshold has a weight of 1.0 (Lee, 1995). An estimator of the population \(y\)-mean obtains by dividing the estimator of the estimated \(y\)-total by the (estimated) population size.

From a practical point of view, the choice of constant \(c\) in Dalén’s estimators is rather tricky because we cannot only derive \(c\) from a large order statistic, say \(y_{(k)}\), \(k < n\) (like for trimming). Instead, the corresponding weight \(w_{(k)}\) needs to be taken into account.

3.1 Bare-bone methods

The bare-bone functions are:

weighted_mean_dalen()
weighted_total_dalen()

The estimators Z2 and Z3 can be specified by the argument type; by default type = "Z2". The censoring threshold \(c\) is implemented as argument censoring.

> weighted_mean_dalen(los, weight, censoring = 1500)
2 of 71 observations censored
[1] 10.73129

4.2 Survey methods

The following estimation methods are available:

svymean_dalen()
svytotal_dalen()

> svymean_dalen(~los, dn, censoring = 1500)
     mean    SE
los 10.73 1.129

The utility functions coef(), vcov(), SE(), summary(), residuals(), fitted(), and robweights() are available.

5 M-Estimation

5.1 Bare-bone methods

The following estimation methods are available:

weighted_mean_huber()
weighted_total_huber()
weighted_mean_tukey()
weighted_total_tukey()
huber2()

The estimators with postfix _huber and _tukey are based on, respectively, the Huber and Tukey (biweight) \(\psi\)-function.

IMPORTANT

Two types of \(M\)-estimators are available:

type = "rhj": robust weighted mean (robust Hajek estimator)
type = "rht": robust Horvitz-Thompson estimator of Hulliger (1995) \(\rightarrow\) separate vignette

The robust Horvitz-Thompson estimator (type = "rht") is the method of choice for pps designs (i.e., designs without replacement where the sample inclusion probabilities are proportional to some measure of size). For equal-probability designs, the \(M\)-estimator of type = "rhj" tends to be superior.

The losdata is a simple random sample; thus, \(M\)-estimators of type = "rht" are the methods of choice. Here, we compute the Huber-type robust weighted \(M\)-estimator of the mean with robustness tuning constant \(k=8\).

> weighted_mean_huber(los, weight, type = "rhj", k = 8)
[1] 11.17228

Good to know.

In general, the tuning constant k must be chosen larger than (loosely speaking) “we are used to choose it”. More precisely, in the context of an infinite population with a standard Gaussian distribution, the constant \(k = 1.345\) ensures that the Huber \(M\)-estimator of location achieves 95% efficiency compared with the arithmetic mean under the Gaussian model. The efficiency considerations underlying the choice of \(k = 1.345\) do not carry over to distributions other than the Gaussian.

The \(M\)-estimators are computed by iterative methods. If the algorithm fails to converge, the functions return NA. By default, the algorithm uses a maximum of maxit = 50 iterations and a numerical tolerance criterion of tol = 1e-5 as a stopping rule. Other values of maxit and tol can be specified in the function call.

The function huber2() is an implementation of the weighted Huber proposal 2 estimator. It is only available as bare-bone method.^{Note 3}

> huber2(los, weight, k = 8)
[1] 13.02817

5.2 Survey methods

The following estimation methods are available:

svymean_huber()
svytotal_huber()
svymean_tukey()
svytotal_tukey()

The Huber \(M\)-estimator of the mean (and its standard error) can be computed with

> m <- svymean_huber(~los, dn, type = "rhj", k = 8)
> m
     mean    SE
los 11.17 1.328

The summary() method summarizes the most important facts about the \(M\)-estimate

> summary(m)
Huber M-estimator (type = rhj) of the sample mean 

     mean    SE
los 11.17 1.328

Robustness:
  Psi-function: with k = 8 
  mean of robustness weights: 0.9877 

Algorithm performance:
  converged in 4 iterations
  with residual scale (weighted MAD): 5.93 

Sampling design:
Independent Sampling design
svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata)

The estimated scale (weighted MAD) can be extracted with the scale() function. Additional utility functions are coef(), vcov(), SE(), residuals(), fitted(), and robweights(). The following figure shows a plot of the robustness weights against the residuals. We see that large residuals are downweighted.

> plot(residuals(m), robweights(m))

5.3 Adaptive estimation

An adaptive \(M\)-estimator of the total (or mean) is defined by letting the data chose the tuning constant \(k\). Let \(\widehat{T}\) denote the weighted total, and let \(\widehat{T}_k\) be the Huber \(M\)-estimator of the weighted total with robustness tuning constant \(k\). Under quite general regularity conditions, the estimated mean square error (MSE) of \(\widehat{T}_k\) can be approximated by (see e.g., Gwet and Rivest, 1992; Hulliger, 1995)

\[\widehat{\mathrm{mse}}\big(\widehat{T}_{k}\big) \approx \mathrm{var}\big(\widehat{T}_{k}\big) +\big(\widehat{T} - \widehat{T}_{k}\big)^2.\]

The minimum estimated risk (MER) estimator (Hulliger, 1995) selects \(k\) such that \(\widehat{\mathrm{mse}}\big(\widehat{T}_{k}\big)\) is minimal (among all candidate estimators).

Now, suppose that we have been working on the \(M\)-estimator with \(k=8\).

> m <- svymean_huber(~los, dn, type = "rhj", k = 8)

Next, we compute the MER, starting from the current \(M\)-estimate, i.e., object m.

> mer(m)
Search interval: [0.5, 1000]
Minimum found for k =  14.4532 
Rel. efficiency gain: 39%
     mean    SE
los 11.84 1.716

Hence, the MER is 39% more efficient than the classical estimator as an estimator of the population total.

6 Utility Functions

6.1 Weighted quantile and median

The weighted quantile (and median) can be computed by

> weighted_quantile(los, weight, probs = c(0.1, 0.9))
10% 90% 
  3  22 
> weighted_median(los, weight)
50% 
  8

When all weights are equal, the computed quantiles of weighted_quantile() are equal to base::quantile() with argument type = 2.

6.2 Weighted MAD: median absolute deviation

The normalized weighted median absolute deviations about the weighted median can be computed with

> weighted_mad(los, weight)
[1] 5.930408

By default, the normalization constant to make the weighted MAD an unbiased estimator of scale at the Gaussian core model is constant = 1.482602. This constant can be changed if necessary.

6.3 Weighted IQR: interquartile range

The normalized weighted interquartile range can be computed with

> weighted_IQR(los, weight)
[1] 6.6717

By default, the normalization constant to make the weighted IQR an unbiased estimator of scale at the Gaussian core model is constant = 0.7413. This constant can be changed if necessary.

Notes

¹ All bare-bone methods can be called with the argument info = TRUE to return a list with the following entries: characteristic (e.g., mean), estimator (e.g., trimmed estimator), estimate (numerical value), variance (by default: NA), robust (list of arguments that specify robustness), residuals (numerical vector), model (list of data used for estimation), design (by default: NA), call.

² We have constructed the losdata as a showcase; though, the LOS measurements are real data that we have taken from the \(201\) observations in Ruffieux et al. (2000). Our losdata are a sample of size \(n = 70\) from the \(201\) original observations.

³ The function huber2() is is similar to MASS::hubers() (Venables and Ripley, 2002). It differs from the implementation in MASS in that it allows for weights and is initialized by the (normalized) weighted interquartile range (IQR) not the median absolute deviations (MAD).

Bibliographical Notes

The paper of Chambers (1986) is the landmark paper about outliers in finite population sampling. Lee (1995) and Beaumont and Rivest (2009) are a good starting point to learn about robustness in finite population sampling.

Trimming and winsorization are discussed in Lee (1995) and Beaumont and Rivest (2009). The variance estimators are straightforward adaptions of the classical estimators; see Huber (1981) or Serfling (1980). A rigorous treatment in the context of finite population sampling can be found in Shao (1994).

Rao (1971) was among the first to propose weight reduction. Consider a sample of size \(n\), and suppose that the \(i\)th observation is an outlier. He suggested to reduce the outlier’s sampling weight \(w_i\) to one, and redistribute the weight difference \(w_i−1\) among the remaining observations. As a result, observation \(i\) does not represent other values like it. Dalén’s estimator offers a more general notion of weight reducution; see Dalén (1987) and also Chen et al. (2017).

In the context of finite population sampling, M-estimators were first studied by Chambers (1986). He investigated robust methods in the model- or prediction based framework of Royall and Cumberland (1981). Model-assisted estimators were introduced (for ratio estimation) by Gwet and Rivest (1992) and studied by Lee (1995), and Hulliger (1995, 1999, 2005). A recent comprehensive treatment can be found in Beaumont and Rivest (2009).

References

Beaumont, J.F., Rivest, L.P. (2009). Dealing with outliers in survey data, in: D. Pfeffermann, C.R. Rao (eds.), Sample Surveys: Theory, Methods and Inference, volume 29A of Handbook of Statistics, chapter 11, pp. 247–280. Elsevier, Amsterdam.

Chambers, R. (1986). Outlier Robust Finite Population Estimation. Journal of the American Statistical Association 81, pp. 1063–1069.

Dalén, J. (1987). Practical Estimators of a Population Total Which Reduce the Impact of Large Observations, Research Report, Statistics Sweden.

Chen. Q, Elliott, M.R., Haziza, D., Yang, Y., Ghosh, M., Little, R.J.A., Sedransk, J., Thompson, M. (2017). Approaches to Improving Survey-Weighted Estimates. Statistical Science 32, pp. 227–248.

Gwet J.P. and Rivest L.P. (1992). Outlier Resistant Alternatives to the Ratio Estimator. Journal of the American Statistical Association 87, pp. 1174-1182.

Huber, P. (1981). Robust Statistics, John Wiley & Sons, New York.

Hulliger B (2005). Horvitz-Thompson Estimators, Robustified, in: S. Kotz (ed.), Encyclopedia of Statistical Sciences, volume 5, 2nd edition. John Wiley and Sons, Hoboken (NJ).

Hulliger, B (1999). Simple and robust estimators for sampling, in: Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 54–63. American Statistical Association.

Hulliger. B. (1995). Outlier Robust Horvitz–Thompson Estimators. Survey Methodology 21, pp. 79–87.

Lee, H. (1995). Outliers in business surveys, in: B.G. Cox, D.A. Binder, B.N. Chinnappa, A. Christianson, M.J. Colledge, and P.S. Kott (eds.), Business survey methods, chapter 26: 503–526. John Wiley and Sons, New York.

Lumley, T (2021). survey: analysis of complex survey samples. R package version 4.0, URL https://CRAN.R-project.org/package=survey.

Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R: A Guide to Analysis Using R. John Wiley and Sons, Hoboken, NJ.

Rao, J.N.K. (1971). Some Aspects of Statistical Inference in Problems of Sampling from Finite Populations, in: V.P. Godambe and D.A. Sprott (eds.), Foundations of Statistical Inference, pp. 171-202. Holt, Rinehart, and Winston, Toronto.

Royall, R.M. and Cumberland, W.G. (1981). An Empirical Study of the Ratio Estimator and Estimators of its Variance. Journal of the American Statistical Association 76, pp. 66-82.

Ruffieux, C., F. Paccaud, and A. Marazzi (2000). Comparing rules for truncating hospital length of stay, Casemix Quarterly 2.

Shao., J (1994). L-Statistics in complex survey problems. The Annals of Statistics 22, pp. 946–967.

Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics John Wiley & Sons, New York.

Venables, W.N. and Ripley B.D. (2002). Modern Applied Statistics with S. 4th edition. Springer, New York.

Vignette: Basic Robust Estimators

Beat Hulliger and Tobias Schoch

Outline

Estimating methods

Population characteristics

Modes

1 LOS (Length-of-stay) Hospital Data

1.1 Survey design object

1.2 Exploring the data

2 Trimming

2.1 Bare-bone methods

2.2 Survey methods

3 Winsorization

3.1 Bare-bone methods

3.2 Survey methods

4 Weight Reduction Methods

3.1 Bare-bone methods

4.2 Survey methods

5 M-Estimation

5.1 Bare-bone methods

5.2 Survey methods

5.3 Adaptive estimation

6 Utility Functions

6.1 Weighted quantile and median

6.2 Weighted MAD: median absolute deviation

6.3 Weighted IQR: interquartile range

Notes

Bibliographical Notes

References