Cardinality sanity check

I think this should be rather simple, but adding here before I forget. We could do with adding a sanity check somewhere to throw up a warning/exception/error if a polynomial is defined with the cardinality beyond a certain threshold. This will prevent the user from sitting there forever, as well as preventing the runtime freezing.

This should certainly be implemented for the tensor-grid basis since the cardinality explodes with the number of parameters there, and it is easy to determine the cardinality a priori, i.e. see below:

d = 50
p = 2
L = (p+1)**d
print('%.2e'%L)

params = [Parameter(distribution='uniform', lower=-1, upper=1, order=p) for j in range(d)]
basis = Basis('tensor-grid')
poly = Poly(params,basis) # TODO - check within this call if L > L_allowed
print(poly.basis.get_cardinality())

This may require more thought for some of the other basis options…

3 Likes

As someone who often accidentally crashes their computer because of forgetting to change basis I’d very much welcome this!

1 Like

Glad to know it isn’t just me! We’re releasing a new version very soon so I shall ensure this functionality is incorporated :slight_smile:

Hi all, I have now added functionality for this into the latest develop branch. It turned out to be a little more challenging than first envisioned since even getting to the point where we can call basis.get_cardinality() may be computationally intractable if the dimensions/polynomial orders are too high. My solution is to add two cardinality limits:

  1. A hard limit CARD_LIMIT_HARD = int(1e6) in basis.py. This checks the cardinality as the selected basis is being constructed, and raises an exception if the limit is reached. This limit is quite large, even though you wouldn’t want to construct a polynomial from such a large basis, since a number of the basis types prune/subsample from an initial total-order or tensor-grid basis. So we often need to set quite a large initial basis before pruning it down later.
  2. A soft limit CARD_LIMIT_SOFT = 50e3 is then enforced in poly.py. This raises an exception if a Poly object is defined with a basis cardinality over this limit. Running Poly.set_model() with a cardinality over this limit often leads to rather long compute times. The limit is set as a soft limit, overridable with override_cardinality=True, since the user may find longer run times acceptable, and may wish to incorperate their own limit into their workflow.

Below are two examples. Firstly, an example with the hard limit reached when constructing the Basis. The basis here would have cardinality=6.533\times 10^{77}, enough to freeze the runtime. Thankfully an exception will now be raised before it gets to this.

# Define d=100 parameters of order 5
d = 100
params = []
for j in range(d):
    params.append(eq.Parameter(distribution='uniform',lower=-1,upper=1,order=5))

# Set tensor-grid basis. Resulting cardinality=6.533e77
orders = [param.order for param in params]
basis=eq.Basis('tensor-grid',orders=orders)

Secondly, an example with cardinality= 176.9\times 10^3. This is OK to construct a basis for, but leads to set_model taking a rather long time. If this is acceptable to you, and you really do want this number of dimensions and polynomial orders, you can override the soft limit (as is done below).

# Define d=100 parameters of order 3
d = 100
params = []
for j in range(d):
    params.append(eq.Parameter(distribution='uniform',lower=-1,upper=1,order=3))

# Set total-order basis. Resulting cardinality= 176851
orders = [param.order for param in params]
basis=eq.Basis('total-order',orders=orders)

# Define data and poly
X = np.random.uniform(-1,1,size=(1000,d))
Y = X[:,0]**2
poly = eq.Poly(params,basis,method='least-squares',
               sampling_args={'sample-points':X,'sample-outputs':Y},
               override_cardinality=True)

Note: If orders isn’t given in the Basis definition, the 1st limit in basis.py will not be enforced until the Basis is passed to the Poly object.

1 Like

Hi @psesh and @Nick , what do you think about the above? Re the values for the two limits and the general strategy…

I will mark as solved and push the changes if you think it is sensible.

2 Likes

This functionality has now been merged into version 9.1.0, so I am marking as solved.