Piecewise Polynomials with Ridge Detection

DiegoLopez · 20 April 2021 06:49

Hello,

I have a regression problem for which I would like to try the Piecewise Polynomials with Ridge Detection capability in the code. I was wondering if you could give me a hand in this task as I don’t have much experience with this topic.

I have 3 quantities of interest that are described by 35 parameters. I would like to create analytical models for each of these such that I can employ them with a search algorithm for optimisation. For this task, I have sampled 284 candidates for which I know the true value of the functions.

How should I proceed for creating the polynomials? Should I be normalising my data in some way prior to fitting them? What splitting criterion is recommended?

I’m able to share my dataset if anyone’s interested in having a look, just let me know.

Thanks in advance for your help!

psesh · 20 April 2021 09:11

Hi @DiegoLopez. Can you upload your dataset on a Github repo and drop a link here please? Cheers!

DiegoLopez · 20 April 2021 12:08

Hi @psesh. You can find it here:

link_to_dataset.csv

It is a (284 x 38) matrix, where each row after the header is a sample. The first 35 columns contain the parameter values, and the last three, the values for the quantities of interest.

Cheers!

psesh · 24 April 2021 10:17

Hi @DiegoLopez. I’ve had a first attempt at this. Given the few datapoints you have, I’ve only be able to apply linear regression trees. My original thought process was that once we have polynomials defined over each node, we can then compute their active subspace. This is possible because regression trees output a series of Poly instances, which can then be fed into the Subspace class.

For the former, see the code below:

import equadratures as eq
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv('dataset.csv')
N, d = data.shape
input_var_names = ['v'+str(i) for i in range(1, d-2)]

X = pd.DataFrame(data, columns=input_var_names).values

tree_1 = eq.PolyTree(splitting_criterion='loss_gradient', order=1)
tree_1.fit(X,data['qoi1'].values)
tree_1.get_splits()

This should return:

[[-0.6428257804365949, 18],
 [-0.2806524228882106, 23],
 [0.5000000092592589, 17],
 [0.3320158013032749, 12]]

individual_polys = tree_1.get_polys()
individual_polys

This should return a list of polynomials:

[<equadratures.poly.Poly at 0x1329a0250>,
 <equadratures.poly.Poly at 0x132c122e0>,
 <equadratures.poly.Poly at 0x132324c70>,
 <equadratures.poly.Poly at 0x132cc7310>,
 <equadratures.poly.Poly at 0x132c4f670>]

In terms of output, here is what each polynomial looks yields:

for little_polys in individual_polys:
    little_polys.plot_model_vs_data()
    fig, ax = little_polys.plot_sobol(order=1, show=False)
    ax.set_xticklabels(input_var_names, fontsize=8)

index index1

index2 index3

index4 index5

index.7png index6

index8 index9

One can also use the online GraphViz tool to view the tree structure:

So what do we do next? My suggestion would be to see if you can generate more data / simulations, and also see if there is a strong physical rationale for the split locations. Hope this helps.

DiegoLopez · 26 April 2021 07:03

Hi @psesh. Thanks for this. I wanted to give your code a try myself but I run into an issue when constructing the tree_1 object.

`TypeError: __init__() got an unexpected keyword argument 'splitting_criterion'`

I’m using equdratures version 9.0.0 - This is the latest one I could get by doing

`pip3 install equadratures --upgrade`

I was able to clone the latest version from github, but this produced the following error:

`Exception: invalid splitting_criterion`

It seems the only options available for splitting_criterion in this version are “model_aware” and “model_agnostic”

Do you have any suggestions as for how I might get the “loss_gradient” option working for me?

Comment on the number of samples: each evaluation of qoi1, qoi2 and qoi3 is very expensive, so unfortunately I cannot afford to obtain more than I have at the moment. I just need to get the best performing model possible for this number of runs and try to make the best out of it.

ascillitoe · 26 April 2021 09:16

Hi @DiegoLopez, sorry for the confusion here. The loss gradient splitting is currently in the develop branch, we plan to merge it into master (and update pip) this coming Friday.

In the meantime you could also try model_aware and model_agnostic, but you will also be missing some of the plotting functionality @psesh used above. FYI model_aware attempts to fit polynomials at each candidate split location (and then choses the split which minimises the loss), hence it can be quite slow when there are a large number of dimensions. model_agnostic instead uses the CART tree induction algoritm to build the tree, and then fits polynomials to the resulting tree. It is very fast, but the resulting model isn’t always as accurate.

Alternatively, if you’d like to install the latest version from develop, you can install this for now with:

pip install git+https://github.com/Effective-Quadratures/equadratures.git@develop

Or you could use git to maintain a local version:

git clone https://github.com/Effective-Quadratures/equadratures.git
git checkout --track origin/develop

and then install your local version via pip:

pip install -e <location_of_local_equadratures_repo>