Variable naming in Poly

Many of our models have multiple outputs, for example a gas turbine performance model may predict pressures and temperatures at multiple stations in the gas path. As a result it’s not uncommon to find ourselves training a large number of models from the same input dataset.

Our normal way of dealing with this is to fit all of these in one step with a list comprehension so we can then pass this list around various functions without needing to have lots of duplicate code for each separate output quantity of interest. The problem with this is that it’s then a bit tricky to remember which output variable is associated with which Poly.

A possible improvement could be to allow “naming” of the Poly objects which can then be accessed like any other attribute. It would make plotting a bit simpler as the plot title (or y-axis label) could then just be read directly from Poly rather than needing to keep track of a separate list of variable names. I think Parameter already does this via:

self.variable = variable

Could this also be added to Poly? I guess the assigned variable name should also be inherited in the derivative objects like Correlations etc. I’m happy to have a go at implementing as it should just be a quick addition.

Hi @mdonnelly, adding this as an optional argument seems like a nice idea to me, we could also incorporate the variable name into titles in the new plotting functions e.g. Poly.plot_polyfit_1D() etc.

I guess the only argument against doing this is that it does add a little bit more code into equadratures, when one could just manually add their own attribute to the Poly once defined if they like? e.g.

mypoly = eq.Poly(parameters=my_param_list, basis=my_basis, method='numerical-integration')
mypoly.variable = variable

@psesh any thoughts?

I guess another possible use could be to have an extra line or two within get_summary() to explicitly state what variable it concerns just to potentially remove a little bit of ambiguity.

if self.variable is not None:
    variable_string = str('The output variable is ' + str(self.variable))
    added = added + variable_string

Good point about just manually adding them - in a similar way I guess you could also specify the summary output filename to be something based off the variable to make the above redundant too!.

1 Like

Ah yes that’s a great idea with the get_summary(). I suppose at the moment the summary outputs might lose their usefulness when you have many variables unless you’re quite careful with the naming of the files?

I reckon this feature is a worthwhile one to add, it’s only a few lines of extra code and sounds like it could add a fair bit of convenience for you.

P.s. just out of curiosity, do you actually use get_summary() or know anyone that does? I’m just wondering about its usefulness in general, and whether there might other formats we want to think about outputting info in? i.e. would the functionality to output to a csv file, pandas data frame, or something else entirely be useful?

I think this is a very do-able task, and we could easily alter the “y”-axis for the relevant plots to capture that. @mdonnelly, are there specific output plots you require (e.g., truth vs polynomial / response surfaces?),

@ascillitoe the use-case for me is as a bit of an audit file for our workflows. We sometimes revisit analyses months later and it’s often helpful to have a concise summary of what was run in terms of parameter assumptions etc rather than needing to open up each notebook/dataset etc. If something else like a csv is better then I’m all ears!

@psesh some examples of the more routine ones we create are below (mainly for UQ and sensitivity analysis). All fairly standard matplotlib type stuff. We have toyed with things like Bokeh but those are a bit more niche.

  • Matrix plot of all input parameter distributions e.g. all the CDFs/PDFs/histograms of the parameters associated with a poly.
  • Truth vs prediction for both train and test data sets.
  • Matrix plot of all main effects, essentially just independent parameter sweeps over their upper and lower bounds (recognise this is a bit more difficult for distributions that don’t have fixed limits).
  • Heatmap showing strength of (two-way) input parameter interactions
  • Output distributions with overlaid confidence intervals
  • Parallel coordinates plot showing all outputs.
  • Pareto plot of first and total Sobol’s
1 Like

I’ve made a PR covering the initial part of this (variable naming). As much as I’d be happy to help on the plotting side I expect you’d want to drive that yourselves!

1 Like

Cheers @mdonnelly! I’ll have a go at updating the plotting functionality and drop a note here when done.

1 Like

Hi @psesh, shall I merge in @mdonnelly’s PR or do you want to commit your updates to that PR?

Hi @ascillitoe, yes for now please merge the PR.

1 Like

Hi @mdonnelly, @Simardeep27 has been implementing a few of the plotting mentions you mentioned above for us. Can I just check with you please, Re this one:

Did you have in mind something like the seaborn pairplot? I have a few ideas regarding how we could improve on this, but just wanted to check if this is what you meant first!

Hi @ascillitoe,

Short answer is yes and no!

I do use those types of plots regularly (usually the scatter_matrix from Pandas) but the Seaborn one does look a bit more modern straight out the box. They’re useful as a quick glance to see what the sampling space of the problem is (distributions on the diagonals and whether there’s any correlation in the off-diagonals). Sometimes I also plot the output data in this too more as a quick EDA type step to spot the high-level relationships. Having it built in would be useful.

However I think what I was thinking about in the earlier post was just a way to plot all the PDFs of the input parameters being used by Poly. Calling it a matrix plot was probably a bad choice on my part - I just meant something like putting them all into an X by Y subplot as often we have lots (e.g. 40+) parameters. Use case for this one is similar as the one above and just general reporting.

1 Like

Hi @mdonnelly, thanks for this, very helpful!

@ascillitoe I made a little PR just covering the second bit of my comment above. Also spotted a bug (maybe) that I’ve had a go at fixing.

1 Like

Hi @mdonnelly , the above is now merged into version 9.1 so I shall mark as solved. Please do feel free to shout out if you feel this isn’t correct.

1 Like