Moment-based approach for sampling optimization (exp)

Bayesian estimation for rate parameter

Let the observed failure times be: \(m_1, m_2, m_3, ...,m_n\). Let the prior distribution of \(\lambda\) be \(f(\lambda)\). Then,

\[ f(\lambda | m_1, m_2, ..., m_n) = \frac{ \lambda^ne^{-\lambda \sum_i m_i}f(\lambda)}{\int_0^{\infty}\lambda^ne^{-\lambda \sum_i m_i}f(\lambda)d\lambda} \]

Let \(s=\sum_im_i\),

\[ f(\lambda | s) = \frac{ \lambda^ne^{-\lambda s}f(\lambda)}{\int_0^{\infty}\lambda^ne^{-\lambda s}f(\lambda)d\lambda} \]

Choosing \(f(\lambda)\) to be the Jeffery’s prior (\(f(\lambda) \propto \frac{1}{\lambda}\)),

\[ f(\lambda | s) = \frac{ \lambda^ne^{-\lambda s}\frac{1}{\lambda}}{\int_0^{\infty}\lambda^ne^{-\lambda s}\frac{1}{\lambda}d\lambda} = \frac{\lambda ^{n-1} s^n e^{-\lambda s}}{\Gamma (n)} \]

That is, \(\Lambda \sim \operatorname{Gamma}(n, s)\), or \(\Lambda \sim \operatorname{Gamma}(n, n / \hat\lambda)\).

\[ \operatorname{E}[\Lambda] = \frac{n}{n/\hat\lambda} = \hat\lambda \\ \operatorname{Var}[\Lambda] = \frac{n}{(n/\hat\lambda)^2} = \frac{\hat\lambda^2}{n} \]

Moment-based approach

First moment:

\[ \operatorname{E}[M(\vec{\Theta})] \approx M(\hat{\vec\theta}) + \frac{1}{2}\sum_{i=1}^n \frac{\partial^2 M}{\partial \theta_i^2}(\hat{\vec\theta})\operatorname{Var}[\Theta_i] \\+ \sum_{i=1}^n\sum_{j=1}^{i-1}\frac{\partial^2 M}{\partial \theta_i \partial\theta_j}(\hat{\vec\theta})\operatorname{Cov}[\Theta_i, \Theta_j] \]

Second moment:

\[ \operatorname{E}[M(\vec{\Theta})^2] \approx M(\hat{\vec\theta})^2 \\ + \sum_{i=1}^n(\frac{\partial M}{\partial \theta_i}(\hat{\vec\theta})^2 + M(\hat{\vec\theta})\frac{\partial^2 M}{\partial \theta_i^2}(\hat{\vec\theta}))\operatorname{Var}[\Theta_i]\\ +2\sum_{i=1}^n\sum_{j=1}^{i-1}(\frac{\partial M}{\partial \theta_i}(\hat{\vec\theta})\frac{\partial M}{\partial \theta_j}(\hat{\vec\theta})+M(\hat{\vec\theta})\frac{\partial^2 M}{\partial \theta_i \partial\theta_j}(\hat{\vec\theta}))\operatorname{Cov}[\Theta_i, \Theta_j] \]


\[ \operatorname{Var}[X] = \operatorname{E}[X^2] - \operatorname{E}[X]^2 \]

Application on the two-states problem

Consider the measure uptime percentage \(U(\lambda, \mu) = \frac{\mu}{\lambda +\mu}\). First order partial derivatives are:

\[ \frac{\partial U}{\partial \lambda} = -\frac{\mu }{(\lambda +\mu )^2} \\ \frac{\partial U}{\partial \mu} = \frac{1}{\lambda +\mu }-\frac{\mu }{(\lambda +\mu )^2} \]

Second order partial derivatives are:

\[ \frac{\partial^2 U}{\partial \lambda^2} = \frac{2 \mu }{(\lambda +\mu )^3} \\ \frac{\partial^2 U}{\partial \mu^2} = \frac{2 \mu }{(\lambda +\mu )^3}-\frac{2}{(\lambda +\mu )^2} \\ \frac{\partial^2 U}{\partial \lambda \partial \mu} = \frac{2 \mu }{(\lambda +\mu )^3}-\frac{1}{(\lambda +\mu )^2} \]

Assuming independence (note that \(M\) is a random variable, i.e. capital \(\mu\)),

\[ \operatorname{E}[U(\Lambda, M)] \approx U(\hat\lambda, \hat\mu) + \frac{1}{2}(\frac{\partial^2 U}{\partial \lambda^2}(\hat\lambda, \hat\mu)\operatorname{Var}[\Lambda] + \frac{\partial^2 U}{\partial \mu^2}(\hat\lambda,\hat\mu)\operatorname{Var}[M]) + 0\\ = \frac{\hat\mu}{\hat\lambda+\hat\mu} + \frac{1}{2}(\frac{2\hat\mu}{(\hat\lambda+\hat\mu)^3}\operatorname{Var}[\Lambda] + (\frac{2 \hat\mu }{(\hat\lambda +\hat\mu )^3}-\frac{2}{(\hat\lambda +\hat\mu )^2})\operatorname{Var}[M]) \]


\[ E[U(\Lambda, M)^2] \approx U(\hat\lambda, \hat\mu)^2 + (\frac{\partial U}{\partial \lambda}(\hat\lambda,\hat\mu)^2 + U(\hat\lambda, \hat\mu)\frac{\partial^2 U}{\partial \lambda^2}(\hat\lambda, \hat\mu))\operatorname{Var}[\Lambda] + (\frac{\partial U}{\partial \mu}(\hat\lambda,\hat\mu)^2 + U(\hat\lambda, \hat\mu)\frac{\partial^2 U}{\partial \mu^2}(\hat\lambda, \hat\mu))\operatorname{Var}[M] + 0\\ =(\frac{\hat\mu}{\hat\lambda+\hat\mu})^2 + (\frac{\hat\mu^2 }{(\hat\lambda +\hat\mu)^4}+\frac{\hat\mu}{\hat\lambda+\hat\mu}\frac{2\hat\mu}{(\hat\lambda+\hat\mu)^3})\operatorname{Var}[\Lambda] + ((\frac{1}{\hat\lambda +\hat\mu }-\frac{\hat\mu }{(\hat\lambda +\hat\mu )^2})^2+\frac{\hat\mu}{\hat\lambda+\hat\mu}(\frac{2 \hat\mu }{(\hat\lambda +\hat\mu )^3}-\frac{2}{(\hat\lambda +\hat\mu )^2}))\operatorname{Var}[M] \]


\[ \operatorname{Var}[U(\Lambda, M)] \approx -\frac{\hat{\mu }^2 \text{Var$\Lambda $}^2}{\left(\hat{\lambda }+\hat{\mu }\right)^6}+\frac{\text{Var$\Lambda $} \left(2 \hat{\lambda } \hat{\mu }^3+\hat{\lambda }^2 \hat{\mu }^2+\hat{\mu }^4\right)}{\left(\hat{\lambda }+\hat{\mu }\right)^6}-\frac{\hat{\lambda }^2 \text{VarM}^2}{\left(\hat{\lambda }+\hat{\mu }\right)^6}+\frac{\text{VarM} \left(2 \hat{\lambda }^3 \hat{\mu }+\hat{\lambda }^2 \hat{\mu }^2+\hat{\lambda }^4\right)}{\left(\hat{\lambda }+\hat{\mu }\right)^6}+\frac{2 \hat{\lambda } \hat{\mu } \text{Var$\Lambda $} \text{VarM}}{\left(\hat{\lambda }+\hat{\mu }\right)^6} \]

Since \(\operatorname{Var}[\Lambda] = \frac{\hat\lambda^2}{n_1}\) and \(\operatorname{Var}[M] = \frac{\hat\mu^2}{n_2}\),

\[ \operatorname{E}[U(\Lambda, M)] \approx \frac{\hat\mu}{\hat\lambda+\hat\mu} + \frac{1}{2}(\frac{2\hat\mu}{(\hat\lambda+\hat\mu)^3}\frac{\hat\lambda^2}{n_1} + (\frac{2 \hat\mu }{(\hat\lambda +\hat\mu )^3}-\frac{2}{(\hat\lambda +\hat\mu )^2})\frac{\hat\mu^2}{n_2})\\ =\frac{\hat{\mu } \left(\hat{\lambda } \hat{\mu } n_1 \left(2 n_2-1\right)+\hat{\lambda }^2 \left(n_1+1\right) n_2+\hat{\mu }^2 n_1 n_2\right)}{n_1 n_2 \left(\hat{\lambda }+\hat{\mu }\right)^3} \]


\[ E[U(\Lambda, M)^2] \approx (\frac{\hat\mu}{\hat\lambda+\hat\mu})^2 + (\frac{\hat\mu^2 }{(\hat\lambda +\hat\mu)^4}+\frac{\hat\mu}{\hat\lambda+\hat\mu}\frac{2\hat\mu}{(\hat\lambda+\hat\mu)^3})\frac{\hat\lambda^2}{n_1} + ((\frac{1}{\hat\lambda +\hat\mu }-\frac{\hat\mu }{(\hat\lambda +\hat\mu )^2})^2+\frac{\hat\mu}{\hat\lambda+\hat\mu}(\frac{2 \hat\mu }{(\hat\lambda +\hat\mu )^3}-\frac{2}{(\hat\lambda +\hat\mu )^2}))\frac{\hat\mu^2}{n_2}\\ =\frac{\hat{\mu }^2 \left(2 \hat{\lambda } \hat{\mu } n_1 \left(n_2-1\right)+\hat{\lambda }^2 \left(3 n_2+n_1 \left(n_2+1\right)\right)+\hat{\mu }^2 n_1 n_2\right)}{n_1 n_2 \left(\hat{\lambda }+\hat{\mu }\right)^4} \]


\[ \operatorname{Var}[U(\Lambda, M)] \approx E[U(\Lambda, M)^2] - \operatorname{E}[U(\Lambda, M)]^2 \\ = \frac{1}{4} \left(-\text{Var$\Lambda $}^2 M^{(2,0)}\left(\hat{\lambda },\hat{\mu }\right)^2+4 \text{Var$\Lambda $} M^{(1,0)}\left(\hat{\lambda },\hat{\mu }\right)^2-\text{VarM}^2 M^{(0,2)}\left(\hat{\lambda },\hat{\mu }\right)^2\\ +4 \text{VarM} M^{(0,1)}\left(\hat{\lambda },\hat{\mu }\right)^2-2 \text{Var$\Lambda $} \text{VarM} M^{(0,2)}\left(\hat{\lambda },\hat{\mu }\right) M^{(2,0)}\left(\hat{\lambda },\hat{\mu }\right)\right)\\ =\frac{\hat{\lambda }^2 \hat{\mu }^2 \left(2 \hat{\lambda } \hat{\mu } n_1 n_2 \left(n_1+n_2+1\right)+\hat{\lambda }^2 n_2 \left(n_1^2+n_2 n_1-n_2\right)+\hat{\mu }^2 n_1 \left(n_2^2+n_1 \left(n_2-1\right)\right)\right)}{n_1^2 n_2^2 \left(\hat{\lambda }+\hat{\mu }\right)^6} \]

We approximate the distribution of \(U\) using a normal distribution \(\tilde{U}\) with the same expectation and variance. By the propertes of normal distributions, \(Z = \frac{\tilde{U} - \operatorname{E}[U]}{\sqrt{\operatorname{Var}[U]}} \sim N(0, 1)\). Choosing an error tolerance probability \(\alpha\), let \(z_{1 - \alpha/2} = F_Z(1 - \alpha/2)\),

\[ P(-z_{1-\alpha/2} \leq Z \leq z_{1 - \alpha/2}) = 1 - \alpha \]


\[ P(\operatorname{E}[U]-\sqrt{\operatorname{Var}[U]}z_{1-\alpha/2} \leq \tilde{U} \leq \operatorname{E}[U]+\sqrt{\operatorname{Var}[U]}z_{1 - \alpha/2}) = 1 - \alpha \]

The width of this interval is \(w = 2z_{1 - \alpha/2}\sqrt{\operatorname{Var}[U]}\).

We need to choose a desired witdth. Let’s say we want the relative error to be within a ratio \(\beta\). That is,

\[ \frac{w}{\operatorname{E}[U]} \leq \beta \]

Now, we need some constraints on \(n_1\) and \(n_2\). If we consider the sampling costs to be identical, then the cost function is:

\[ c(n_1, n_2) = n_1 + n_2 \]

Overally, the optimization problem is:

\[ \min_{n_1, n_2} c(n_1, n_2) \\ \text{subject to } \frac{w}{\operatorname{E}[U]}\leq \beta \\ \]

The normal approximation will not work for small number of \(n\)s.

Solution of the optimization problem

The method of Lagrange multipliers can be used to solve this optimization problem. Let \(k\) be the multiplier, the Lagrange function is

\[ L(n_1, n_2, k) = c(n_1, n_2) - k (\frac{w}{\operatorname{E}[U]}) \]

We need to solve the follow equations for \(n_1\), \(n_2\), and \(k\):

\[ \frac{\partial L}{\partial n_1} =0 \\ \frac{\partial L}{\partial n_2} = 0\\ k (\frac{w}{\operatorname{E}[U]} - \beta) = 0 \]

Work with concrete numbers

see mathematica notebook

Some Observations


The width of the confidence interval \(\frac{w}{\operatorname{E}[U]} = \frac{2z_{1 - \alpha/2}\sqrt{\operatorname{Var}[U]}}{\operatorname{E}[U]}\) is majorly effected by \(\operatorname{Var}[U]\) as \(\operatorname{E}[U]\) becomes stable when \(n_1\) and \(n_2\) is large.


\(\hat\lambda\) and \(\hat\mu\) is estimated using a small number of samples (like 10). This will likely result in a relative large shift in the posterior dsitribution (compared to the real one) as well as \(\operatorname{E}[U]\). However, this doesn’t matter too much as variance is what we really care instead of mean.


Consider the components of \(\operatorname{Var}[U(\Lambda, M)]\):

\[ \operatorname{Var}[U(\Lambda, M)] \approx E[U(\Lambda, M)^2] - \operatorname{E}[U(\Lambda, M)]^2 \\ = \frac{1}{4} \left(-\text{Var$\Lambda $}^2 M^{(2,0)}\left(\hat{\lambda },\hat{\mu }\right)^2+4 \text{Var$\Lambda $} M^{(1,0)}\left(\hat{\lambda },\hat{\mu }\right)^2-\text{VarM}^2 M^{(0,2)}\left(\hat{\lambda },\hat{\mu }\right)^2\\ +4 \text{VarM} M^{(0,1)}\left(\hat{\lambda },\hat{\mu }\right)^2-2 \text{Var$\Lambda $} \text{VarM} M^{(0,2)}\left(\hat{\lambda },\hat{\mu }\right) M^{(2,0)}\left(\hat{\lambda },\hat{\mu }\right)\right)\\ \]

There are two major factors on how the sample size of a parameter affects the uncertainty of a measure.

  1. how the sample size affects the variance of the parameter itself, e.g. \(\operatorname{Var}[\Lambda] = \frac{\hat\lambda^2}{n_1}\).
  2. how sensitive is the measure to the change of the parameter (around the true value), .e.g \(M^{(1,0)}\left(\hat{\lambda },\hat{\mu }\right)^2 = \frac{\partial M}{\partial \lambda}(\hat{\lambda },\hat{\mu})^2\)

Merely looking at the product of the two should already give us good estimation (to be verified).


The fact that \(n_1\) and \(n_2\) is almost identical is not to be generalized. In this particular example, the above two factors for these two parameters happen to balance each other. Specifically, when \(\lambda = 1\) and \(\mu = 100\), \(\operatorname{Var}[M]\) will be \(100^2\) times more sensitive than \(\operatorname{Var}[\Lambda]\). However, accidentally, $ (,)^2$ is \(100^2\) times more sensitive than $ (,)^2$. Therfore, the overall effects of \(n_1\) and \(n_2\) appear to be the same.

## Generalization to more complex models

Generalization to more complex models (more parameters and more complex measures) are straight forward as \(\operatorname{Var}[M[\Theta]]\) will always have the following form (assuming life times of components are exponential):

\[ \operatorname{Var}[M[\Theta]] = \sum_i C_i(\hat\Theta) \frac{\hat\theta_i^2}{n_i} + \sum_i D_i(\hat\Theta) (\frac{\hat\theta_i^2}{n_i})^2 + \sum_i\sum_jE_{ij}(\hat\Theta)\frac{\hat\theta_i^2\hat\theta_j^2}{n_in_j} \]

The lagrangian and its gradients can be computed easily provided we know the gradients and Hessians of the measure function \(M\) at estimated pont \(\hat\Theta\) (no symbolic expression required).

\[ \nabla_{\theta, k} L = 0 \]

where \(k\) is the multiplier. The only problem is that we have to solve a system of \(n+1\) non-linear equations. Search based methods like gradient descent may be adopted.

A simplified version

Consider the confidence interval constraint:

\[ \frac{2z_{1 - \alpha/2}\sqrt{\operatorname{Var}[U]}}{\operatorname{E}[U]} \leq \beta \]

If we consider \(\operatorname{E}[U]\) to be a constant and ignore second order terms of \(\operatorname{Var}[U]\), the constraint is simplied to:

\[ \sum_i C_i(\hat\Theta) \frac{\hat\theta_i^2}{n_i} \leq B \]

where B is just some constant. For a constant cost $V = n_1 + n_2 + … $, the left part is minimized when \(n_i\) is distributed proportionally to \(C_i(\hat\Theta)\hat\theta_i^2\). Then, the optimization can be easily done by increasing \(V\) gradually until the constraint is satisfied.