Correlated Uncertainty Quantification Workflow

I’ve recently had a few reasons to doubt whether my workflow for uncertainty studies with correlations is optimal so wanted to ask the question and get some advice if possible. I’m dealing with a computational model with ~50 input variables and also want to know the sensitivities (knowing which variables aren’t important is just as valuable as which ones are in this case).

I’ll illustrate it with the example inspired from a previous post Sensitivity Analysis with Effective Quadratures - Blog - Discourse — equadratures:

We can setup the standard problem:

s1 = Parameter(distribution='Gaussian', shape_parameter_A=0., shape_parameter_B=1., order=3)
s2 = Parameter(distribution='Gaussian', shape_parameter_A=0., shape_parameter_B=1., order=3)
R = np.array([ [ 1.0,  0.8],
               [ 0.8,  1.0]])

basis = Basis('total order') 
poly = Poly([s1, s2], basis, method='least-squares', sampling_args={'sample-points':X_train,'sample-outputs':y_train})

In this case the training data has been generated from a Latin Hypercube (sampled over the same support but uniformly) in the expensive code. The reason behind sampling from a uniform distribution in the training data is to get the best representation of the full design space. Is there anything wrong with doing this? Specifically fitting the model with uniform data despite investigating Gaussian input uncertainties?

With the independent poly defined it can now be transformed to account for the correlations:

corr = Correlations(poly, R)
corr.set_model(y_train_corr)
poly2 = corr.get_transformed_poly()
poly2.get_mean_and_variance()

The problem here is that in order to get the poly to feed into Correlations it needs the X_train and y_train datasets and then in order to use the correlated one that also needs fitting and so it requires two full sets of runs of the expensive code which feels inefficient. I suspect there’s a way of bypassing one of these but I’m not sure how. This is the main problem I’m having and I’ve recently been resorting to just doing correlated Monte Carlo with the previous independent poly instead of creating a Correlations instance.

I think a more effective way to do the UQ study generally here is to use subspaces and sample in the projected space as I have so many variables but I’m not clear how this would be set up for the correlated inputs case. The sensitivity metrics are also important but I think the weightings of the subspace vector would give me enough here to get an insight into what’s driving the most change in the design space.

1 Like

Hi Mike, thanks for the queries and continued interest in the code! I read the queries in both this post and the one on the separate thread on sensitivity analysis. Let me try to answer these together in this reply.

You are correct that the input distribution (which includes information about input correlations) does not come into play directly when we are interested in building a surrogate model for prediction. Whatever the input distribution, if we give an X we expect to get the same y. This is however not true when we want to evaluate integrals of the output. These include the output mean, variance and indeed sensitivity indices. The sensitivity indices (Sobol’ indices) involve integrals of the form of

\int g(y(x))\; \rho(x)\; dx

where g is some functional of the output y, and \rho being the input distribution. Thus, it is in fact true that the sensitivities of the inputs depends on the input distribution. To understand this, consider a model containing a parameter with a narrow Gaussian distribution. When the Gaussian is centred near a region where the function is relatively steep, the parameter should be more important than when the Gaussian is centred near a plateau of the function.

However, we should bear in mind that the training inputs do not have to be sampled from the actual distribution of the input parameters. Counterintuitively, it may sometimes be better to sample from a different distribution than the designated one. For example, we know theoretically that in some cases, sampling from an arcsine distribution offers some advantage when doing least squares for bounded polynomial systems. Thus, it is very much possible to build the surrogate model using inputs from e.g. LHS while declaring the input parameters to be some other distribution of interest (e.g. uniform, with correlations…). However, when calculating integrals of interest, we should use the “corrected polynomial” from the Correlations class.

In the test case you presented, I think the following should work:

s1 = Parameter(distribution='Gaussian', shape_parameter_A=0., shape_parameter_B=1., order=3)
s2 = Parameter(distribution='Gaussian', shape_parameter_A=0., shape_parameter_B=1., order=3)
R = np.array([ [ 1.0,  0.8],
               [ 0.8,  1.0]])

basis = Basis('total order') 
poly = Poly([s1, s2], basis, method='least-squares', sampling_args={'sample-points':X_train,'sample-outputs':y_train})

corr = Correlations(R, poly=poly) # updated to new syntax (in develop branch right now...)
corr.set_model()
poly2 = corr.get_transformed_poly()
poly2.get_mean_and_variance() # or get Sobol' indices

What I have changed is to remove the argument of the set_model call. We shouldn’t need to sample any extra points or run the expensive model again, because the coefficients can be fitted with the ones we already gave to Poly. The Correlations class may modify the actual training input points used (in case of using the Nataf transform, not so if we use Gram-Schmidt) but this is done behind the scenes. Please let me know if this doesn’t work.

For dimension reduction, in the case of active subspaces, we do note that the important directions (i.e. eigenvectors of the gradient covariance) does depend on the input distribution, again because it comes from an integral with respect to the input distribution. We just need to make sure the points where the gradients are evaluated in subspaces.py are sampled with the desired distribution. In fact, the poly here only serves as a predictive model to evaluate the gradients, so it is even possible to just use an uncorrelated poly model. The grad_points option of the Subspaces class is useful here (in a version we are releasing soon, this will be passed as a dict to the init of a Subspaces instance). For polynomial variable projection, a similar idea applies: if we sample the points fed to Subspaces to our desired distribution, we should get ridge directions that correspond to the desired distribution (approximately, due to finite sample size!).

Handling correlations in input for polynomial chaos approximations is still an active area of research and a new path of development for us, so if you have any queries and suggestions we are more than happy to hear them. Hope this helps!

2 Likes

Hi Nick sorry it’s taken me a while to reply. Thanks very much for the clarifications and in-depth reply! It’s all falling into place now (I was thinking a bit too simplistically in terms of what sensitivities actually mean and had missed a couple updates to the code).

I’d be interested to read more about why arcsine distribution offer advantages if you have a reference you could direct me to? Concentrating training data at the edges at the edges of the design space is something I’ve observed empirically to give better model fits. That said it’s also often the case that a greater proportion of cases of the expensive model fail to converge at the extremes and so on balance there’s probably not a lot in it.

Thanks for highlighting the argument change in set_model. All working great now - the only thing I had to add to the code above was to explicitly declare the Gram-Schmidt method in the initialisation. I’m (slowly) getting to grips with the underlying maths and will definitely have more questions!

2 Likes

Hi @mdonnelly, a good reference for this is: (PDF) Constructing Least-Squares Polynomial Approximations.