Development ideas for newcomers!

ascillitoe · 30 March 2021 13:39

p.s. @Simardeep27 did you post a draft GSOC application you wanted us to look at a while ago? I thought I saw something but now can’t find it…

Simardeep27 · 30 March 2021 14:13

Actually, I used the methods like PDP, ALE and ICE, the functions do help in providing insights for example ICE gives a PDP plot for all the instances, so we can easily interpret the change in target value due to a feature, I also tried ALE, it basically gives a much better feature influence rate as it removes drawback of presence of highly correlated columns,
for example in Aerofoil dataset the correlation of Suction column and AoA column is nearly 0.75, so this becomes a huge factor when looking at feature influence from only a single variable.

I tried using a Polynomial object for the PDP but it give an error, will try this method too.

Simardeep27 · 30 March 2021 14:15

Yes, I did send the draft , should I send it again?

ascillitoe · 30 March 2021 14:33

Sorry! Found the proposal! I had thought it was in a post not a message. I shall get back to you asap on it.

Simardeep27 · 30 March 2021 17:51

No problem,
Sure , thanks again

Simardeep27 · 2 April 2021 07:15

Hey @ascillitoe, I was working on the interpretation methods, and I checked that there might be an error due to scipy not being imported for the sobol indices, actually I am using scipy to arrange the parameters for the xticks, so I think scipy needs to be imported

Simardeep27 · 2 April 2021 14:21

I have implemented few interpretation methods like PDP,ICE and ALE, but ICE is not working in colab, I tried it on a jupyter notebook and it worked fine, so should I continue with implementing more methods?

ascillitoe · 6 April 2021 10:25

Hi @Simardeep27 , with regards to this scipy issue, is this just for the scipy.arange() calls? Is there any reason we can’t just use numpy.arange() instead? Feel free to do a PR to fix this if so

ascillitoe · 6 April 2021 10:27

Do the different interoperation methods tell you anything interesting about the dataset? i…e what is causing the aerofoil noise etc? Any thoughts on what they are saying compared to what the Sobol’ indicies are telling you?

Simardeep27 · 6 April 2021 11:05

Oh, yes we can use numpy for this, I’ll make this change.

Simardeep27 · 6 April 2021 11:25

Actually they tell us about the feature importance, and also helps to explain feature interaction, it also breaks the relationship between features and the true outcomes, For example in our case the Partial Dependence plot shows the effect of the features on our outcome, as Frequency increases the Sound decreases, same goes for the Suction feature, but this has a drawback that it assumes that the features are independent. Hence, we use an ALE plot which also shows that Suction and AoA has high correlation. The difference between Sobol Indices and interpretation methods is that in Sobol indices we are finding the effect of variance of the input on the output whereas in these methods we are looking at the actual values, where the change is occurring, so it gives a better understanding of the model. In our case the PDP plot shows that the value of the noise increases first due to the Frequency and then have a drastic decline, these insights cannot be gained from sobol indices.

Simardeep27 · 7 April 2021 12:47

Hey @ascillitoe , should I research on this topic more? Also, I think this could be useful addition to Equadratures, so should I take this as a GSoC project?

ascillitoe · 7 April 2021 13:17

I think it would make a nice blog post for now! My initial thinking into exploring these other interpretation methods was that it would be interesting to see what they tell us compared to the Sobol indices. They are more tricky to compute than the Sobol indices but it looks like they do offer useful information, so perhaps we could think about applying some of them to equadratures polynomials as a stretch goal in the GSOC project. Sorry for taking a while to get back to you about your project proposal, I shall reply to you about that with more details in the next few days.

Simardeep27 · 7 April 2021 14:35

Oh, okay, that’s what I was thinking because currently model interpretation techniques are not used a lot. Also, another issue is that the functions like PDP, ALE are present but these are not integrated into a single package or module.

No problem.