In-built plotting methods

ascillitoe · 22 January 2021 13:51

To reduce the amount of boilerplate code required for visualisations, it would be great to have in-built plotting functions added to the main modules in Effective Quadratures. For example Poly.plot() and Subspaces.plot_zonotope().

It would be important to implement the *args and/or **kwargs properly so the user can customise plots, and perhaps return ax and fig handles for later customisation.

Here is a running to-do list of plotting functions to be added:
Univariate and bivariate polynomials, including sigma bounds and quadrature points.
Dimension reduction - summary plots
Dimension reduction - 2D contours over subspace, zonotopes etc.
Polynomial trees - Partitions in feature space, graphviz of tree (already done).
Poly coefficients.
Sobol indices bar charts.
Index sets.
PDF’s of parameters.
Predictions vs truth for some points
Solver plots. e.g. convergence plots for iterative solvers, and regularisation path for elastic net.

I’ve made this post a wiki others should be able to add their own plotting ideas to the list.

ascillitoe · 22 January 2021 14:00

It would be super helpful if users could provide notebooks or pseudo code when giving examples of plotting tasks. To kick us off, here are two notebooks which cover plotting a 1D and 2D Poly, and a few tasks associated with computing moments:

psesh · 29 January 2021 18:51

Below are some new additions.

Plot of orthogonal polynomials

myparam = Parameter(distribution='normal', shape_parameter_A = 0.0, \
              shape_parameter_B = 1.0, order=5, variable='Design variable')
myparam.plot_orthogonal_polynomials(xlim=[-5, 5], ylim=[-10, 10])

index

Plot of a parameter’s marginal

myparam.plot_pdf(xlim=[-6, 6], ylim=[0, 0.50])

index

data = np.random.randn(1500, 1)
myparam.plot_pdf(data=data,xlim=[-5, 5])

index

Plot of a polynomial’s fit vs. data

basis = Basis('univariate')
def fun(x):
    return -x**2 + x**3 + 10*x - np.sin(3 * x)
mypoly = Poly(myparam, basis, method='numerical-integration')
mypoly.set_model(fun)
mypoly.plot_model_vs_data()

index

Plot of a polynomia’s fit in 1D

mypoly.plot_polyfit_1D(output_variances=0.05, ylim=[-205, 205], xlim=[-10, 10])

index

mypoly.plot_polyfit_1D(uncertainty=False, ylim=[-205, 205], xlim=[-10, 10])

index

For more details, see:

ascillitoe · 26 February 2021 10:58

Hi @psesh , these plots are looking really nice.

My only thought/concern is whether having all the plotting methods in the Plot() class is the right way to go? From a maintainability standpoint it’s nice to have them all in plot.py, but I think having Poly(), Parameter() etc inherit the Plot() class might be confusing? I was worried about what this would mean if we wanted to have Poly() etc inherit other classes later, but it turns out we’re ok on that front as “multi-inheritance” means we classes can inherit from more than one class at once.

A minor issue I have is that I think having the main building blocks (Poly() etc) inherit from Plot() is confusing from a code dev point of view i.e. if you called super(Poly) in Poly.__init__() it would return Plot. But my main issue is that each subclass (i.e. Poly) will inherit all the plotting methods in Plot, not just the Poly specific ones. As an example, doing dir(Poly) or using autocomplete in IDE’s would list Poly.plot_pdf() as a viable command.

I can think of two approaches to avoid the above:

Keep all the plotting methods in Plot() in plot.py, but instead of having Poly etc inherit all of Plot, use composition to only take the relevent methods from Plot(). We could then also have more general plotting methods in Plot(), and Poly etc could take the relevant ones to build more involved plots e.g. hav a method to generate the fig, ax, and any custom settings in Plot, but do the actual plotting in Poly.

class Plot(object):
  @staticmethod
  def plot_1d():
    print('I plot stuff')
  @staticmethod
  def plot_2d():
    print('I plot even more stuff')

class Poly(object):
  def __init__(self):
    self.plot = Plot()
  def plot_poly(self):
    return Plot.plot_1d()

Get rid of Plot(). Keep all the plotting methods for each class in the class i.e. plot_pdf in Parameter, and just define our own matplotlib stylesheet and set rcParams in plot.py (see link).

ascillitoe · 26 February 2021 10:59

I think for most purposes option 2 will be enough. Option 1 would be a bit more complicated, but would allow for more customisation of plots.

ascillitoe · 26 February 2021 12:22

Hi @psesh , I updated Parameter().plot_orthogonal_polynomials() to demonstrate the new approach.

Note that the strategy is a tiny bit different to option 1 above. Its more like the below:

class Plot(object):
  @staticmethod
  def plot_orthogonal_polynomials(Parameter):
    print('I plot parameters')
  @staticmethod
  def plot_poly_method(Poly):
    print('I plot polynomials')

class Parameter(object):
  def __init__(self):
    pass

  def plot_orthogonal_polynomials(self):
    return Plot.plot_orthogonal_polynomials(self)

i.e. plot_orthogonal_polynomials() in Parameter() passes the Parameter object as an argument, so we can access this in the corresponding Plot method. If you do dir(Parameter) you’ll see it only has this one plotting method belonging to it (good!). Whereas dir(Poly) still shows all the plotting methods in Plot() (bad! Poly is still using the old approach…).

Also a few other minor changes:

Added a ax=None keyword argument. Enables you to pass in previously defined ax, so you could set figure size etc yourself. Maybe we should even hold off on ax.set_xlabel etc if ax is passed in?
Added show=True keyword arg. If False then plt.show() isn’t called internally. This can be annoying if you want to plot many things, as when running a script each fig would only be plot once you’ve closed the previous one. If the user sets show=False they’re free to do their own plt.show() whenever it suits. (also changed save and show to be two separate if statements, so you can save and show at the same time.
Added return fig, ax. Might be useful if the user wants to customise further, or include this ax within another fig subplot etc.

psesh · 27 February 2021 14:52

A query: Is it strictly necessary to have a Plot() class? I couldn’t get sphinx to properly compile with static methods within a class that does not have a constructor. So the workaround I propose is to just have those plot commands in plot.py file. In code:

Snippet of plot.py

import seaborn as sns
import matplotlib.pyplot as plt
from equadratures.datasets import score
import numpy as np
sns.set(font_scale=1.5)
sns.set_style("white")
sns.set_style("ticks")

def plot_pdf(Parameter, ax=None, data=None, save=False, xlim=None, ylim=None, show=True, return_figure=False):
    """
    Plots the probability density function for a Parameter.

    :param Parameter Parameter: 
        An instance of the Parameter class.
    :param matplotlib ax: 
        An instance of the ``matplotlib`` axes class.
    :param ndarray data: 
        Samples from the distribution (or a similar one) that need to be plotted as a histogram.
    :param bool save: 
        Option to save the plot as a .png file.
    :param list xlim: 
        Lower and upper bounds for the horizontal axis, for example ``xlim=[-3, 5]``.
    :param list ylim: 
        Lower and upper bounds for the vertical axis, for example ``ylim=[-1, 1]``.
    """
    s_values, pdf = Parameter.get_pdf()
    fig = plt.figure(figsize=(8,6))
    ax = fig.add_subplot(1,1,1)
    plt.fill_between(s_values,  pdf*0.0, pdf, color="gold" , label='Density', interpolate=True, hatch="\\\\\\\\", edgecolor="grey",  linewidth=0.5,alpha=0.5)
    if data is not None:
        plt.hist(data, 50, density=True, facecolor='dodgerblue', alpha=0.7, label='Data', edgecolor='white')
    plt.xlabel(Parameter.variable.capitalize())
    plt.ylabel('PDF')
    if xlim is not None:
        plt.xlim([xlim[0], xlim[1]])
    if ylim is not None:
        plt.ylim([ylim[0], ylim[1]])
    plt.legend()
    sns.despine(offset=10, trim=True)
    if save:
        plt.savefig('pdf_plot.png', dpi=140, bbox_inches='tight')
    if show:
        plt.show()
    if return_figure:
        return fig, ax

and then in this case within Parameter.py we stick to the syntax you proposed:

Snippet of parameter.py

import equadratures.plot as plot

def plot_pdf(self, **kwargs):
    """
    Plots the probability distribution.
    """
    return plot.plot_pdf(self,**kwargs)

ascillitoe · 27 February 2021 14:56

I think that would be totally fine! Only motivation I can think off for having a Plot() class is if we want to store parameters within it, like settings and things. But I think most of what we want should be covered by the sns.set_style() stuff?

psesh · 27 February 2021 14:57

yes, i think that it should cover what we need for now.

Simardeep27 · 31 March 2021 11:04

Hey @ascillitoe,
I was thinking to take up another plotting method, should I try to make a method for regularisation path for elastic net?

ascillitoe · 31 March 2021 12:04

Sure! If you like you could also do another Sobol’ indices method, but this time for the total indices obtained from .get_total_sobol_indices().

Re the regularisation path, here is a notebook with a basic example:

Notice the $\sigma$ bounds which are added when crit='CV'. This is showing the standard deviation in MSE over five folds when “cross-validation” is chosen as the selection criterion (the criterion used to pick the “optimal” $\lambda$ value. When crit='AIC' or 'BIC' are chosen this isn’t needed as we only have one fold.

Also, for higher order/dimension problems we probably need some logic to limit how many coefficients are plotted. Perhaps we just pick a certain number of the largest ones?

ascillitoe · 31 March 2021 12:05

@psesh, are you happy to tick off the dimension reduction plotting methods? Or were there other plots we still need there?

psesh · 31 March 2021 12:28

We don’t have a sufficient summary plot yet. Its an easy addition actually.

ascillitoe · 31 March 2021 13:13

Ah yes forgot about that, I’ve added it to the list. Thanks!

Simardeep27 · 1 April 2021 12:21

Hey @ascillitoe , sorry for the late reply.
Actually I was trying to understand the concept of Solver module, till now what I was able to understand was that from our model, regpoly in this case. we extract certain information like
Lamdas, x_path, IC, IC_std, idx and the elements from the basis module and using these we have to plot a graph between x_path vs Lamda , the issue I was facing was that I wasn’t able to understand the theory behind the elastic_net(solver) code, could you suggest any paper or tutorial explaining this?

Thanks

ascillitoe · 1 April 2021 12:40

Hi @Simardeep27, no problem at all! The general set-up of equadratures is that after defining our parmeters and basis, we are left with a linear system $\mathbf{y}=\mathbf{A}\boldsymbol{\beta}$, where the $\mathbf{A}$ matrix contains the polynomial basis terms and $\boldsymbol{\beta}$ is a vector of polynomial coefficients. We then need a “solver” to solve this system to obtain $\boldsymbol{\beta}$. See this tutorial for more details, or this post for a more simple linear regression example.

The simplest solver is the method='least-squares', which is ordinary least squares (OLS) regression, i.e. it simply minimises $\lVert \boldsymbol{y} -\mathbf{A}\boldsymbol{\beta} \rVert_2^2$. The elastic net adds L1 and L2 penality terms to this, i,.e. it minimises:

$$
\frac{1}{2}||{\mathbf{A}\boldsymbol{\beta}-\mathbf{y}}||_2^2+\lambda_1||\boldsymbol{\beta}||_1+\frac{1}{2}\lambda_2||\boldsymbol{\beta}||_2^2 \
\lambda_1 = \lambda \alpha \
\lambda_2 = \lambda(1-\alpha)
$$

The challenge with the above is deciding on the $\lambda$ value, which controls the strength of the penalty terms. The method='elastic-net' efficiently solves the above system for many different values of $\lambda$. The “regularisation path” is a plot showing the polynomial coefficients for each $\lambda$ value. When $\lambda=0$ we return to OLS regression, and when $\lambda\rightarrow\infty$ the coefficients tend to zero. We choose the “optimal” $\lambda$ value by minimising an information criterion such as AIC, BIC, or cross-validated MSE, so it is useful to plot this at the same time (i.e. see that notebook I shared). We use the coordinate descent algorithm described in this paper by Friedman et al. (Fig. 1 shows some example regularisation paths):

Simardeep27 · 1 April 2021 17:29

Thanks a lot for the explanation,
I will start working on this as soon as possible.

Simardeep27 · 2 April 2021 04:51

Hey @ascillitoe, I started working on the plot, for the higher number of coefficients, if we use Bokeh or plotly and give user a CheckBox to choose which coefficient plot they want to view, would that work?

ascillitoe · 6 April 2021 10:21

Hi @Simardeep27, interactive plots with Bokeh or plotly is a great idea, and certainly something we could thread into the GSOC proposal. However, I still think we need the capability to do a standard matplotlib plot for users to use in scripts, writing reports etc.

I’ve updated the notebook above (here too) with an example of what I think is the easiest solution. We can have an optional argument nplot (default None). If it exists we just plot the first nplot largest coefficients (at \lambda=\lambda_{min}), otherwise if nplot=None we plot all the coefficients.

Simardeep27 · 6 April 2021 11:26

Oh, Thanks a lot for the suggestion.

I will take a look at the notebook, thanks again.