Optimizing Sampling Numbers for Probability Model Parameter Estimation

Introduction

Two-states availability model

Consider the two-states availability model and the measure uptime ratio \(U(\lambda, \mu) = \frac{\mu}{\lambda +\mu}\). The parameters \(\lambda\) and \(\mu\) are estimated using data samples. Let’s assume \(n_1\) failure times are observed, which are used to estimate \(\lambda\), and \(n_2\) repair times are observed, which are used to estimate \(\mu\). A cost function \(C(n_1, n_2)\) is given to quantify the total cost of experiments.

Estimation of parameters has uncertainty. This uncertainty will be propagated to the measure \(U\). Given the required uncertainty of \(U\), for example in terms of standard error or width of confidence interval, we wish to minimize the cost \(C(n_1, n_2)\) while still satisfy the uncertainty requirement of the measure \(U\).

There are three problems to be solved: 1) how to estimate the uncertainty of parameters \(\lambda\) and \(\mu\) with respect to \(n_1\) and \(n_2\) without actually performing the experiments; 2) how to propagate the uncertainty of \(\lambda\) and \(\mu\) to \(U\); 3) how to minimize the cost function \(C(n_1, n_2)\) given the constrain on the uncertainty of \(U\).

For problem 1, it is quite difficult to estimate the uncertainty of a parameter without any prior knowledge. However, we will demonstrate in this paper that having a very crude point estimation is enough, which can be given by an expert or estimated through a small number of initial experiments.

Problem 1: Estimating the uncertainty of parameters

We will use variance to represent uncertainty. Specifically, we want to know how variance of a parameter changes as the number samples changes. Variance of a parameter can be computed either through Bayesian inference or frequentist inference.

Bayesian inference

Suppose data samples are known, the distribution of the true parameter can be computed using Bayesian inference. Variance then follows.

In Bayesian inference, the parameter is considered to have a distribution, too, thus a random variable. Let \(\Lambda\) be corresponding random variable for \(\lambda\).

Let the observed failure times be: \(x_1, x_2, x_3, ...,x_n\). Let the prior density of \(\Lambda\) be \(f(\lambda)\). Then,

\[ f(\lambda | x_1, x_2, ..., x_n) = \frac{ \lambda^ne^{-\lambda \sum_i x_i}f(\lambda)}{\int_0^{\infty}\lambda^ne^{-\lambda \sum_i x_i}f(\lambda)d\lambda} \]

Let \(s=\sum_i x_i\),

\[ f(\lambda | s) = \frac{ \lambda^ne^{-\lambda s}f(\lambda)}{\int_0^{\infty}\lambda^ne^{-\lambda s}f(\lambda)d\lambda} \]

Choosing \(f(\lambda)\) to be the Jeffery’s prior (\(f(\lambda) \propto \frac{1}{\lambda}\)),

\[ f(\lambda | s) = \frac{ \lambda^ne^{-\lambda s}\frac{1}{\lambda}}{\int_0^{\infty}\lambda^ne^{-\lambda s}\frac{1}{\lambda}d\lambda} = \frac{\lambda ^{n-1} s^n e^{-\lambda s}}{\Gamma (n)} \]

That is, \(\Lambda \sim \operatorname{Gamma}(n, s)\), or \(\Lambda \sim \operatorname{Gamma}(n, n / \hat\lambda)\).

\[ \operatorname{E}[\Lambda] = \frac{n}{n/\hat\lambda} = \hat\lambda \\ \operatorname{Var}[\Lambda] = \frac{n}{(n/\hat\lambda)^2} = \frac{\hat\lambda^2}{n} \]

Identical approach can be applied to parameter \(\mu\), and \[ \operatorname{E}[M] = \frac{n}{n/\hat\mu} = \hat\mu \\ \operatorname{Var}[M] = \frac{n}{(n/\hat\mu)^2} = \frac{\hat\mu^2}{n} \]

Assuming that we have all the data samples, we can either use the density function \(f(\lambda | s)\) or the variance \(\operatorname{Var}[U]\) as the representation of uncertainty. However, since we are in the process of planning experiments, data samples are not yet available, the density function is not available. Instead, we can only use variance to estimate uncertainty. Still, variance requires a point estimation. As we stated earlier, a coarse point estimation is enough.

Frequentist inference

For frequentist inference, an estimator of the parameter needs to be chose. We choose \(\hat\Lambda = \frac{n}{S}\) as the estimator for the true parameter \(\lambda\). \(S\) is the sum of all samples, and \(S \sim \operatorname{Erlang}(n, \lambda)\). PDF of \(S\) is \[ f_S(s) = \frac{\lambda^n s^{n-1} e^{-\lambda s}}{(n-1)!} \]

Let’s consider the variance of \(\hat\Lambda\).

\[ F_{\hat\Lambda}(x) = P(\frac{n}{S} < x) = P(S > \frac{n}{x}) = 1-F_S(\frac{n}{x}) \\ \]

\[ f_{\hat\Lambda}(x) = f_S(\frac{n}{x})\frac{n}{x^2} \]

\[ \operatorname{E}[\hat\Lambda] = \int_0^\infty x f_{\hat\Lambda}(x)dx \\ =\int_0^\infty x f_S(\frac{n}{x})\frac{n}{x^2}dx \\ = \frac{n \lambda}{n - 1} \]

\[ \operatorname{E}[\hat\Lambda^2] = \int_0^\infty x^2 f_{\hat\Lambda}(x)dx \\ =\int_0^\infty x^2 f_S(\frac{n}{x})\frac{n}{x^2}dx \\ =\frac{\lambda ^2 n^2}{(n-1)(n-2)} \]

\[ \operatorname{Var}[\hat\Lambda] = \operatorname{E}[\hat\Lambda^2] - \operatorname{E}[\hat\Lambda]^2 = \frac{\lambda ^2 n^2}{(n-2) (n-1)^2} \approx \frac{\lambda^2}{n} \]

The variance requires the true parameter, which is not known. Again, we will use a coarse point estimation. Notice that computing variance is not easy. Generally, the estimator may be rather complex, making it very difficult to find out its density.

Problem 2: Uncertainty Propagation

We will use moment-based method to propagate uncertainty.

First moment:

\[ \operatorname{E}[U(\vec{\Theta})] \approx U(\hat{\vec\theta}) + \frac{1}{2}\sum_{i=1}^n \frac{\partial^2 U}{\partial \theta_i^2}(\hat{\vec\theta})\operatorname{Var}[\Theta_i] \\+ \sum_{i=1}^n\sum_{j=1}^{i-1}\frac{\partial^2 U}{\partial \theta_i \partial\theta_j}(\hat{\vec\theta})\operatorname{Cov}[\Theta_i, \Theta_j] \]

Second moment:

\[ \operatorname{E}[U(\vec{\Theta})^2] \approx U(\hat{\vec\theta})^2 \\ + \sum_{i=1}^n(\frac{\partial U}{\partial \theta_i}(\hat{\vec\theta})^2 + U(\hat{\vec\theta})\frac{\partial^2 U}{\partial \theta_i^2}(\hat{\vec\theta}))\operatorname{Var}[\Theta_i]\\ +2\sum_{i=1}^n\sum_{j=1}^{i-1}(\frac{\partial U}{\partial \theta_i}(\hat{\vec\theta})\frac{\partial U}{\partial \theta_j}(\hat{\vec\theta})+U(\hat{\vec\theta})\frac{\partial^2 U}{\partial \theta_i \partial\theta_j}(\hat{\vec\theta}))\operatorname{Cov}[\Theta_i, \Theta_j] \]

Variance:

\[ \operatorname{Var}[X] = \operatorname{E}[X^2] - \operatorname{E}[X]^2 \]

For two-states availability model in particular,

First order partial derivatives are:

\[ \frac{\partial U}{\partial \lambda} = -\frac{\mu }{(\lambda +\mu )^2} \\ \frac{\partial U}{\partial \mu} = \frac{1}{\lambda +\mu }-\frac{\mu }{(\lambda +\mu )^2} \]

Second order partial derivatives are:

\[ \frac{\partial^2 U}{\partial \lambda^2} = \frac{2 \mu }{(\lambda +\mu )^3} \\ \frac{\partial^2 U}{\partial \mu^2} = \frac{2 \mu }{(\lambda +\mu )^3}-\frac{2}{(\lambda +\mu )^2} \\ \frac{\partial^2 U}{\partial \lambda \partial \mu} = \frac{2 \mu }{(\lambda +\mu )^3}-\frac{1}{(\lambda +\mu )^2} \]

Assuming independence (note that \(M\) is a random variable, i.e. capital \(\mu\)),

\[ \operatorname{E}[U(\Lambda, M)] \approx U(\hat\lambda, \hat\mu) + \frac{1}{2}(\frac{\partial^2 U}{\partial \lambda^2}(\hat\lambda, \hat\mu)\operatorname{Var}[\Lambda] + \frac{\partial^2 U}{\partial \mu^2}(\hat\lambda,\hat\mu)\operatorname{Var}[M]) + 0\\ = \frac{\hat\mu}{\hat\lambda+\hat\mu} + \frac{1}{2}(\frac{2\hat\mu}{(\hat\lambda+\hat\mu)^3}\operatorname{Var}[\Lambda] + (\frac{2 \hat\mu }{(\hat\lambda +\hat\mu )^3}-\frac{2}{(\hat\lambda +\hat\mu )^2})\operatorname{Var}[M]) \]

and,

\[ E[U(\Lambda, M)^2] \approx U(\hat\lambda, \hat\mu)^2 + (\frac{\partial U}{\partial \lambda}(\hat\lambda,\hat\mu)^2 + U(\hat\lambda, \hat\mu)\frac{\partial^2 U}{\partial \lambda^2}(\hat\lambda, \hat\mu))\operatorname{Var}[\Lambda] + (\frac{\partial U}{\partial \mu}(\hat\lambda,\hat\mu)^2 + U(\hat\lambda, \hat\mu)\frac{\partial^2 U}{\partial \mu^2}(\hat\lambda, \hat\mu))\operatorname{Var}[M] + 0\\ =(\frac{\hat\mu}{\hat\lambda+\hat\mu})^2 + (\frac{\hat\mu^2 }{(\hat\lambda +\hat\mu)^4}+\frac{\hat\mu}{\hat\lambda+\hat\mu}\frac{2\hat\mu}{(\hat\lambda+\hat\mu)^3})\operatorname{Var}[\Lambda] + ((\frac{1}{\hat\lambda +\hat\mu }-\frac{\hat\mu }{(\hat\lambda +\hat\mu )^2})^2+\frac{\hat\mu}{\hat\lambda+\hat\mu}(\frac{2 \hat\mu }{(\hat\lambda +\hat\mu )^3}-\frac{2}{(\hat\lambda +\hat\mu )^2}))\operatorname{Var}[M] \]

and,

\[ \operatorname{Var}[U(\Lambda, M)] \approx -\frac{\hat{\mu }^2 \text{Var$\Lambda $}^2}{\left(\hat{\lambda }+\hat{\mu }\right)^6}+\frac{\text{Var$\Lambda $} \left(2 \hat{\lambda } \hat{\mu }^3+\hat{\lambda }^2 \hat{\mu }^2+\hat{\mu }^4\right)}{\left(\hat{\lambda }+\hat{\mu }\right)^6}-\frac{\hat{\lambda }^2 \text{VarM}^2}{\left(\hat{\lambda }+\hat{\mu }\right)^6}+\frac{\text{VarM} \left(2 \hat{\lambda }^3 \hat{\mu }+\hat{\lambda }^2 \hat{\mu }^2+\hat{\lambda }^4\right)}{\left(\hat{\lambda }+\hat{\mu }\right)^6}+\frac{2 \hat{\lambda } \hat{\mu } \text{Var$\Lambda $} \text{VarM}}{\left(\hat{\lambda }+\hat{\mu }\right)^6} \]

Since \(\operatorname{Var}[\Lambda] = \frac{\hat\lambda^2}{n_1}\) and \(\operatorname{Var}[M] = \frac{\hat\mu^2}{n_2}\), \[ \operatorname{E}[U(\Lambda, M)] \approx \frac{\hat\mu}{\hat\lambda+\hat\mu} + \frac{1}{2}(\frac{2\hat\mu}{(\hat\lambda+\hat\mu)^3}\frac{\hat\lambda^2}{n_1} + (\frac{2 \hat\mu }{(\hat\lambda +\hat\mu )^3}-\frac{2}{(\hat\lambda +\hat\mu )^2})\frac{\hat\mu^2}{n_2})\\ =\frac{\hat{\mu } \left(\hat{\lambda } \hat{\mu } n_1 \left(2 n_2-1\right)+\hat{\lambda }^2 \left(n_1+1\right) n_2+\hat{\mu }^2 n_1 n_2\right)}{n_1 n_2 \left(\hat{\lambda }+\hat{\mu }\right)^3} \]

and,

\[ E[U(\Lambda, M)^2] \approx (\frac{\hat\mu}{\hat\lambda+\hat\mu})^2 + (\frac{\hat\mu^2 }{(\hat\lambda +\hat\mu)^4}+\frac{\hat\mu}{\hat\lambda+\hat\mu}\frac{2\hat\mu}{(\hat\lambda+\hat\mu)^3})\frac{\hat\lambda^2}{n_1} + ((\frac{1}{\hat\lambda +\hat\mu }-\frac{\hat\mu }{(\hat\lambda +\hat\mu )^2})^2+\frac{\hat\mu}{\hat\lambda+\hat\mu}(\frac{2 \hat\mu }{(\hat\lambda +\hat\mu )^3}-\frac{2}{(\hat\lambda +\hat\mu )^2}))\frac{\hat\mu^2}{n_2}\\ =\frac{\hat{\mu }^2 \left(2 \hat{\lambda } \hat{\mu } n_1 \left(n_2-1\right)+\hat{\lambda }^2 \left(3 n_2+n_1 \left(n_2+1\right)\right)+\hat{\mu }^2 n_1 n_2\right)}{n_1 n_2 \left(\hat{\lambda }+\hat{\mu }\right)^4} \]

and,

\[ \operatorname{Var}[U(\Lambda, M)] \approx E[U(\Lambda, M)^2] - \operatorname{E}[U(\Lambda, M)]^2 \\ = \frac{1}{4} \left(-\text{Var$\Lambda $}^2 U^{(2,0)}\left(\hat{\lambda },\hat{\mu }\right)^2+4 \text{Var$\Lambda $} U^{(1,0)}\left(\hat{\lambda },\hat{\mu }\right)^2-\text{VarM}^2 U^{(0,2)}\left(\hat{\lambda },\hat{\mu }\right)^2\\ +4 \text{VarM} U^{(0,1)}\left(\hat{\lambda },\hat{\mu }\right)^2-2 \text{Var$\Lambda $} \text{VarM} U^{(0,2)}\left(\hat{\lambda },\hat{\mu }\right) U^{(2,0)}\left(\hat{\lambda },\hat{\mu }\right)\right)\\ =\frac{\hat{\lambda }^2 \hat{\mu }^2 \left(2 \hat{\lambda } \hat{\mu } n_1 n_2 \left(n_1+n_2+1\right)+\hat{\lambda }^2 n_2 \left(n_1^2+n_2 n_1-n_2\right)+\hat{\mu }^2 n_1 \left(n_2^2+n_1 \left(n_2-1\right)\right)\right)}{n_1^2 n_2^2 \left(\hat{\lambda }+\hat{\mu }\right)^6} \] The process may look complicated, but in reality we only need gradients and Hessian at the point estimation point, which can be computed by automatic differentiation instead of symbolic differentiation.

Problem 3: Optimization

Now that the variance \(\operatorname{Var}[U(\Lambda, M)]\) of the measure has been expressed in terms of sample numbers (\(n_1, n_2\)) of all parameters (\(\lambda, \mu\)) and their coarse point estimates (\(\hat\lambda, \hat\mu\)), which is treated as known values at this stage. Use \(V(n_1, n_2)\) to represent the measure variance. We want to minimize the cost function \(C(n_1, n_2)\) under the constraint that the measure variance \(V(n_1, n_2)\) is smaller than a certain value \(w\). Since \(V(n_1, n_2)\) decreases monotonically and \(C(n_1, n_2)\) increases monotonically as \(n_1\) or \(n_2\) increases, the constraint is effectively an equality constraint: \(V(n_1, n_2) =w\). The following optimization problem needs to be solved: \[ \underset{n_1, n_2}{\operatorname{minimize}} C(n_1, n_2) \\ \operatorname{subject to} V(n_1, n_2) = w \] Using Lagrange multiplier method, \[ L(n_1, n_2, k) = C(n_1, n_2) - k (V(n_1, n_2) -w) \] The following equations need to be solved instead: \[ \frac{\partial L}{\partial n_1} =0 \\ \frac{\partial L}{\partial n_2} = 0\\ \frac{\partial L}{\partial k} = 0\\ \]

Two-states availability model with concrete numbers

see mathematica notebook

Simplification

Estimating parameter variance using Cramér–Rao bound

Cramér–Rao bound gives a lower bound of the variance of an unbiased estimator. \[ \operatorname{Var}[\hat\Lambda] \geq \frac{1}{\mathcal{I}(\lambda)} \]

To estimate the rate parameter of an exponential distribution:

\[ \mathcal{I}(\lambda) = -\operatorname{E}[\frac{\partial^2\log f(x_1, x_2, ..., x_n;\lambda)}{\partial \lambda^2}] \\ =-\operatorname{E}[\frac{\partial^2 (n\log \lambda -\lambda s)}{\partial \lambda^2}] \\ =\operatorname{E}[\frac{n}{\lambda^2}] \\ = \frac{n}{\lambda^2} \]

Therefore, \[ \operatorname{Var}[\hat\Lambda] \geq \frac{\lambda^2}{n} \] This bound is much easier to compute compared with the previous two methods (Bayesian and frequentist).

Simplified moment-based uncertainty propagation

If we take out the second order terms (as they become increasingly irrelevant as sample size increases), variance of the measure is simplified to: \[ \operatorname{Var}[U(\Lambda, M)] \approx E[U(\Lambda, M)^2] - \operatorname{E}[U(\Lambda, M)]^2 \\ = \frac{1}{4} \left(-\text{Var$\Lambda $}^2 U^{(2,0)}\left(\hat{\lambda },\hat{\mu }\right)^2+4 \text{Var$\Lambda $} U^{(1,0)}\left(\hat{\lambda },\hat{\mu }\right)^2-\text{VarM}^2 U^{(0,2)}\left(\hat{\lambda },\hat{\mu }\right)^2\\ +4 \text{VarM} U^{(0,1)}\left(\hat{\lambda },\hat{\mu }\right)^2-2 \text{Var$\Lambda $} \text{VarM} U^{(0,2)}\left(\hat{\lambda },\hat{\mu }\right) U^{(2,0)}\left(\hat{\lambda },\hat{\mu }\right)\right)\\ \approx \operatorname{Var}[\Lambda]U^{(1, 0)}(\hat\lambda, \hat\mu)^2 + \operatorname{Var}[M]U^{(0, 1)}(\hat\lambda, \hat\mu)^2 \] The uncertainty of a measure is affected by two parts: 1) the variance of a parameter itself, and 2) the first partial derivative of the measure with respect to the parameter. This factor is squared.

Multi-parameter distributions

Parameter variance estimation

In multi-parameter case, not only variance needs to be determined, but covariance, too.