I’ve recently had a few reasons to doubt whether my workflow for uncertainty studies with correlations is optimal so wanted to ask the question and get some advice if possible. I’m dealing with a computational model with ~50 input variables and also want to know the sensitivities (knowing which variables aren’t important is just as valuable as which ones are in this case).
I’ll illustrate it with the example inspired from a previous post Sensitivity Analysis with Effective Quadratures - Blog - Discourse — equadratures:
We can setup the standard problem:
s1 = Parameter(distribution='Gaussian', shape_parameter_A=0., shape_parameter_B=1., order=3)
s2 = Parameter(distribution='Gaussian', shape_parameter_A=0., shape_parameter_B=1., order=3)
R = np.array([ [ 1.0, 0.8],
[ 0.8, 1.0]])
basis = Basis('total order')
poly = Poly([s1, s2], basis, method='least-squares', sampling_args={'sample-points':X_train,'sample-outputs':y_train})
In this case the training data has been generated from a Latin Hypercube (sampled over the same support but uniformly) in the expensive code. The reason behind sampling from a uniform distribution in the training data is to get the best representation of the full design space. Is there anything wrong with doing this? Specifically fitting the model with uniform data despite investigating Gaussian input uncertainties?
With the independent poly defined it can now be transformed to account for the correlations:
corr = Correlations(poly, R)
corr.set_model(y_train_corr)
poly2 = corr.get_transformed_poly()
poly2.get_mean_and_variance()
The problem here is that in order to get the poly to feed into Correlations it needs the X_train and y_train datasets and then in order to use the correlated one that also needs fitting and so it requires two full sets of runs of the expensive code which feels inefficient. I suspect there’s a way of bypassing one of these but I’m not sure how. This is the main problem I’m having and I’ve recently been resorting to just doing correlated Monte Carlo with the previous independent poly instead of creating a Correlations instance.
I think a more effective way to do the UQ study generally here is to use subspaces and sample in the projected space as I have so many variables but I’m not clear how this would be set up for the correlated inputs case. The sensitivity metrics are also important but I think the weightings of the subspace vector would give me enough here to get an insight into what’s driving the most change in the design space.