Sample statistics deviations

 

Types of errors

The estimates of the mean and median weights of coin series are burdened with four errors:

  1. The data analysed may not be a representative random sample. This may be due to a number of factors. For example, museums and major auction houses tend to select better preserved and therefore potentially heavier specimens. Another example might be a situation where a significant part of the data consists of coins from a single hoard containing coins in a state shortly after leaving the mint, the weight of which is not affected by wear during circulation.
  2. The computed mean and median are sample statistics and are thus subject to sample error. In other words, the data being analysed is a random sample from a probability distribution and therefore the calculated means and medians are random variables that may deviate from the theoretical values.
  3. Data are usually rounded to two decimal places. Some publications and catalogues give weights to three decimal places, but this is not usual.
  4. Coin weights are subject to measurement error, which may exceed the rounding error. The existence of this error can be observed on specimens published in multiple sources. For example, for coins that have appeared on the market more than once, it is sometimes possible to find differences in their weights given in auction catalogues.

Numerical illustration

We now leave aside the first type of error and use the simulation to illustrate the effect of the other three types of errors for coins with stater weight. As in Section Estimation of the weight standard, we will assume that the current weight of coin Wobserved can be expressed as the difference between its original weight Woriginal when it left the mint and the weight loss L due to wear, physico-chemical processes caused by environmental influences over time, and possibly other factors, such as cutting metal off the edge (the so-called clipping), i.e.

Wobserved = WoriginalL.

We will assume that the quantity Woriginal is normally distributed with the parameters μ = 10.80 and σ = 0.07 and the quantity L is exponentially distributed with the parameter λ = 7.00 (this choice of parameter values is inspired by the results in section Estimation of the weight standard). The quantity Wobserved thus has the exponentially modified Gaussian distribution with these parameters μ, σ and λ. Its probability density is shown in Figure 1.

Figure 1: Probability density of the exponentially modified Gaussian distribution

Figure 1: Probability density of the exponentially modified Gaussian distribution

Denote by A and M the mean and median of the quantity Wobserved. For the chosen values of the parameters μ, σ and λ we have

A = 10.657 g,
M = 10.688 g.

In addition, we will consider the rounding to two decimal places and the measurement error. Let us denote the measurement error by ε and assume that it is normally distributed with zero mean and standard deviation 0.02. Its density is shown in Figure 2. The probability that the absolute value of the measurement error is greater than 0.05 g is equal to 1.24%.

Figure 2: Probability density of the measurement error

Figure 2: Probability density of the measurement error

Denote by W(R)observed the value of Wobserved rounded to two decimal places and by W(E)observed the value of Wobserved+ε rounded to two decimal places. Formally expressed

W(R)observed = 100×Wobserved + 1/2/100,
W(E)observed = 100×(Wobserved + ε) + 1/2/100,

where ⌊.⌋ denote the greatest integer less than or equal to the argument (the floor function).

We will consider random samples of size n, where n = 10, 50, 100 and 500 coins. For each random sample from the above specified exponentially modified Gaussian distribution, let us denote w1, … , wn the exact weights of the coins in the sample, w(R)1, … , w(R)n these weights rounded to two decimal places, and w(E)1, … , w(E)n the weights biased by measurement errors and rounded to two decimal places. Denote the sample means and medians as follows

An = (w1 + … + wn)/n,
A(R)n = (w(R)1 + … + w(R)n)/n,
A(E)n = (w(E)1 + … + w(E)n)/n,
Mn = median of w1, … , wn,
M(R)n = median of w(R)1, … , w(R)n,
M(E)n = median of w(E)1, … , w(E)n.

Tables 1 and 2 show the probabilities of deviations of these sample statistics from the mean and median. For example, according to Table 1, the probability that |A10A|>0.01 is 84.2%, where |.| denotes the absolute value of the argument. These probabilities were estimated using the Monte Carlo method, with 107 (ten million) simulations for each sample size n.

sample size deviation probabilities of absolute deviations from the true value
>0.01 >0.02 >0.03 >0.04 >0.05
10 A10A 84.2% 68.9% 54.7% 42.1% 31.3%
A(R)10A 84.1% 68.8% 54.7% 42.0% 31.2%
A(E)10A 84.2% 69.1% 55.0% 42.4% 31.6%
50 A50A 65.6% 37.3% 18.1% 7.4% 2.6%
A(R)50A 65.6% 37.3% 18.1% 7.4% 2.6%
A(E)50A 65.9% 37.7% 18.4% 7.7% 2.7%
100 A100A 52.9% 20.8% 5.9% 1.2% 0.2%
A(R)100A 52.9% 20.8% 5.9% 1.2% 0.2%
A(E)100A 53.2% 21.2% 6.1% 1.3% 0.2%
500 A500A 16.0% 0.5% 0.0% 0.0% 0.0%
A(R)500A 16.0% 0.5% 0.0% 0.0% 0.0%
A(E)500A 16.3% 0.5% 0.0% 0.0% 0.0%

Table 1: Probabilities of deviations of sample means from the mean

sample size deviation probabilities of absolute deviations from the true value
>0.01 >0.02 >0.03 >0.04 >0.05
10 M10M 83.4% 67.5% 52.9% 40.2% 29.6%
M(R)10M 83.4% 67.5% 53.0% 40.3% 29.7%
M(E)10M 83.6% 67.9% 53.5% 40.9% 30.3%
50 M50M 65.5% 37.1% 18.0% 7.6% 2.8%
M(R)50M 65.7% 37.4% 18.2% 7.6% 2.8%
M(E)50M 66.0% 37.9% 18.8% 8.1% 3.0%
100 M100M 53.0% 20.9% 6.0% 1.3% 0.2%
M(R)100M 53.5% 21.4% 6.2% 1.3% 0.2%
M(E)100M 53.9% 21.9% 6.5% 1.5% 0.3%
500 M500M 16.2% 0.5% 0.0% 0.0% 0.0%
M(R)500M 18.7% 0.7% 0.0% 0.0% 0.0%
M(E)500M 18.1% 0.7% 0.0% 0.0% 0.0%

Table 2: Probabilities of deviations of sample medians from the median

Not surprisingly, the results of these simulations show that for coins of higher weight (we considered staters), both rounding and measurement errors play a negligible role in estimating the mean and median. However, the size of the analyzed sample of coins is of course a crucial factor.

 

3 April 2024