There are times in preliminary design when we rarely have enough information about our input data distributions to use anything more involved than either a uniform or triangular distribution (sometimes even the upper and lower bounds of those need to be taken with a pinch of salt!). Nevertheless it’s still valuable to do uncertainty quantification in these early stages as it helps to give an indication of the level of design and development risk which may help inform certain design decisions.

Triangular distributions are not natively built into the code and so I had been using the Analytical distribution for this but was noticing a large bottleneck in performance with this. in things like the initial definition of parameters and later processes like random sampling and correlation transforms. Relative to using a uniform distribution the time taken was an order of magnitude higher (try running the below back to back to see).

# Triangular (analytical)
def tri(a,b,c):
mean = (a + b + c)/3.
var = (a**2 + b**2 + c**2 - a*b - a*c - b*c)/18.
pdf = Weight(lambda x : 2*(x-a)/((b-a)*(c-a)) if (a <= x < c) \
else( 2/(b-a) if (x == c) \
else( 2*(b-x)/((b-a)*(b-c)))), \
support=[a, b], pdf=True)
return pdf
myparameters = [Parameter(distribution='analytical', weight_function=tri(0,1,0.5), order=2) for i in range(50)]
# Uniform
myparameters = [Parameter(distribution='uniform', lower=0, upper=1, order=2) for i in range(50)]

After a little bit of digging I noticed that for many of the native distributions they use the implementation from scipy.stats and so I adapted one of them to use scipy.stats.triang. In some ad hoc tests it runs far quicker with negligible difference in final outputs. Hopefully nothing breaks as a result of it. I’ve tried to keep things as consistent as possible so it’s called in a similar way to the other functions:

# Triangular (scipy)
myparameters = [Parameter(distribution='triangular', lower=0,upper=1, shape_parameter_A=0.5, order=2) for i in range(50)]

I’m just trying to figure out pull requests but on Github but I’ll dump the code below just in case I don’t get there!

"""The Triangular distrubution."""
from equadratures.distributions.template import Distribution
from equadratures.distributions.recurrence_utils import custom_recurrence_coefficients
import numpy as np
from scipy.stats import triang
RECURRENCE_PDF_SAMPLES = 8000
class Triangular(Distribution):
"""
The class defines a Triangular object. It serves as a template for all distributions.
:param double lower:
Lower bound of the support of the distribution.
:param double upper:
Upper bound of the support of the distribution.
"""
def __init__(self, lower=None, upper=None, mode=None):
self.lower = lower # loc
self.upper = upper
self.mode = mode
self.bounds = np.array([0, 1.0])
self.scale = upper - lower # scale
self.shape = (self.mode - self.lower) / (self.upper - self.lower) # c
if (self.lower is not None) and (self.upper is not None) and (self.mode is not None) :
mean, var, skew, kurt = triang.stats(c=self.shape, loc=self.lower, scale=self.scale, moments='mvsk')
self.mean = mean
self.variance = var
self.skewness = skew
self.kurtosis = kurt
self.x_range_for_pdf = np.linspace(self.lower, self.upper, RECURRENCE_PDF_SAMPLES)
self.parent = triang(loc=self.lower, scale=self.scale, c=self.shape)
def get_description(self):
"""
Returns the description of the distribution.
:param Distribution self:
An instance of the distribution class.
"""
text = "is a triangular distribution with a mode of "+str(self.mode)+" over the support "+str(self.lower)+" to "+str(self.upper)+"."
return text
def get_cdf(self, points=None):
"""
Returns the CDF of the distribution.
:param Distribution self:
An instance of the distribution class.
"""
if points is not None:
return self.parent.cdf(points)
else:
raise ValueError( 'Please digit an input for getCDF method')
def get_pdf(self, points=None):
"""
Returns the PDF of the distribution.
:param Distribution self:
An instance of the distribution class.
"""
if points is not None:
return self.parent.pdf(points)
else:
raise ValueError( 'Please digit an input for get_pdf method')
def get_icdf(self, xx):
"""
An inverse cumulative density function.
:param Distribution self:
An instance of the distribution class.
:param xx:
A numpy array of uniformly distributed samples between [0,1].
:return:
Inverse CDF samples associated with the gamma distribution.
"""
return self.parent.ppf(xx)
def get_samples(self, m=None):
"""
Generates samples from the distribution.
:param Distribution self:
An instance of the distribution class.
:param integer m:
Number of random samples. If no value is provided, a default of 5e5 is assumed.
:return:
A N-by-1 vector that contains the samples.
"""
if m is not None:
number = m
else:
number = 500000
return self.parent.rvs(size=number)

And then just the integrations bits:

# Within __init__:
import equadratures.distributions.triangular
# To top of parameter.py:
from equadratures.distributions.triangular import Triangular
# Within _set_distribution:
elif self.name.lower() == 'triangular':
self.distribution = Triangular(self.lower, self.upper, self.shape_parameter_A)

Hi @mdonnelly, welcome! This is a timely feature request as we’ve recently been discussing distributions recently. Our challenge with implementing these is we don’t always know which distributions users might need, so your request/PR is most welcome!

You’re totally correct in that our distributions are built on top of the scipy.stats ones, so theoretically any of the continuous distributions from there can be added to eq following your approach above. The only thing we might also want to include in the PR is a corresponding unit test in tests/test_distributions.py.

As an aside you might be interested in, @Simardeep27 has been working on this PR which would essentially enable you to define any continuous scipy.stats distribution and then pass it into Parameter() (instead of manually implementing each one). We’ve put this on hold though as we need to think a bit more about how to automatically test all these different distributions i.e. so a user can be confident that any distribution they define will work as expected!

Sorry @mdonnelly, just noticed you have already added a unit test! These are currently failing on some python versions due to unrelated issues when loading dependencies on travis, I shall hopefully have this fixed in the next day or two, and then we can check that your tests pass successfully on all versions. After that, we should be able to go ahead with the merge!

Thanks @ascillitoe - I probably threw you off a little regarding the unit test as I posted here first before github and it was only after having a peek of @Simardeep27’s work that I remembered to put one in! A generic scipy.stats wrapper definitely makes sense and is probably a neater overall solution if like you say you can make sure they all work!