Google Summer of Code 2021 Projects

Hi all, it’s that time of year again! :grinning: GSOC 2021 is around the corner so we need to add our project ideas for this year to the NumFOCUS site here. Before we do that, perhaps we can brainstorm ideas here!

Below is an editable list of project ideas:

  • Enhancing visualisation and tutorials: Visualising various equadratures objects, and their underlying data, is a crucial task in many equadratures use cases. To reduce the amount of boilerplate code required for visualisations we have started developing in-built plotting methods. This project would involve enchancing this capability, which might include expanding the range of plotting methods and improving user customisation. Updating our tutorials to show off this new user-friendly capability would also be an important part of the project. Depending on the students interests, there is also the possibility of exploring interactive visualisation with libraries such as bokeh or plotly, or even interactive web apps with streamlit or similar tools.

  • Universal quadrature repository: Delivering accurate quadrature rules—for numerical integration—remains one of the key tenets of Effective Quadratures. While there has been much research, particularly into high-dimensional numerical integration, no universal repository of such quadrature rules exists. In this project, we wish to lay the foundations for such a universal repository of numerical integration rules.

  • Regularisation of polynomials: The double descent phenomenon has seen increasing attention in the machine learning community in recent years. It contradicts the classic bias-variance tradeoff concept, and has important implications for how we build models which can generalise to test data effectively. It is desirable to avoid the double descent behavior, and have test error decrease monotonically with increased model complexity and/or increased amounts of training data. One way to achieve this is by adding regularisation, which if optimally tuned can mitigate double descent in many learning algorithms, from neural networks to linear regression. This project will involve validating and then optimising the elastic-net regularisation solver implemented within equadratures. Improving the performance of this solver would allow for larger real-world supervised ML datasets to be tackled.

1 Like

Hi all.

Let me quickly introduce myself: my name is Ante (Github, Linkedin for more) and I am extremely interested in participating in this year’s GSoC. Currently I am a graduate student in the field of computational bioelectromagnetics.
One of the things I am gradually becoming more and more interested is fast 2-D numerical integration especially on curved surfaces. I was looking at the last year’s GSoC proposition and the one entitled the universal quadrature repository seems particularly interesting and could serve as a starting point for my idea.
Even though it is still pretty early, I would just like to know if a similar project is planned for GSoC 2021?

Thank you :slight_smile:

1 Like

Hi @antelk! Many thanks for your post and for joining our community!

We will have a version of the universal quadrature repository project for this year. Although, given the time constraints this year (projects are shorter), we will likely amend the project accordingly.

2 Likes

Hi @antelk, welcome!

@psesh shall we add this project to the wiki post above? Seems like a great project idea to me.

2 Likes

Thank you, @psesh & @ascillitoe, for the warm welome :slight_smile:
I am looking forward to see the full list of propositions!

Is there a chance that the version of the universal quadrature repository will include the development of 2-D numerical integration on a curved surface (e.g., on the surface of a sphere)?

Hello everyone, My name is Madhu. I am an undergrad student from India. I am very interested to contribute to equadratures. Kindly guide me through the setup and development environment and some beginner-friendly issues.

Hi @antelk , I’m in the process of fleshing out a few more details on the possible projects above. For now I’ve just taken a short description of last years quadrature repository project idea for the description above.

Re more details on the project, @psesh knows more about the specifics of this project than I do, so it might be better to discuss the details with him. One thing I would say is that the GSOC projects involve half the amount of hours this year compared to previous years, so the project’s scope does have to be scaled accordingly. It might be that for the official GSOC milestones we could stick to integration over a hypercube (just to make sure its achievable within the timeframe), but then we could have a stretch goal to implement curved surface capability. A nice feature of GSOC is that the participants often remain actively involved after completion of the official project, so the stretch goal could also be worked on after the offical end date if we hadn’t quite finished it by then.

Hi @madhucharan , welcome! equadratures is on PyPI so hopefully you should be able to install with pip install equadratures. If you are wanting to have a go with editing the code you could of course clone the git repo and then do pip install -e <equadratures repo>.

With regards to beginner-friendly issues, if you haven’t already I’d perhaps start by going through a few of the tutorials to get a feel for the code. These are actually being updated this month, and as a matter of fact improving these further is a possible project idea for this year, so it would certainly be helpful to familiarise yourself with them. The tutorial source is currently in the feature_docs branch of equadratures, so if you find any old deprecated code please feel free to fix and submit PR’s!

If you run into any problems and need more help, please do post in the below category and we’ll be happy to assist!

Hi all, I’ve updated the github wiki with our final project ideas. Do check them out!

2 Likes

Hi all this is Hashim.
I’m a Senior Year Undergrad student from Pakistan. My interests lie in Data Science and Machine Learning hence the the last project idea in the above list caught my attention :smiley: . I have experience working in Python and C++. I have worked with statistics involved in the Data Science workflow, and implemented multiple models. So, I do understand the project gist to a good extent. I would love to work on the project and contribute to Equadratures. I’d love to start off with any “codebase-beginner” PR’s to contribute right away(there’s only just 1 issue and I asked to work on that so I’m open to other suggestions). In the mean time, I’ll be going through the documentation and tutorials.
Best

2 Likes

Hi @mHash1m and others, apologies for taking a while to get back to you with ideas for some possible beginner friendly contributions.

We don’t actually have any open Github issues at the moment, but please do feel free to open your own if you encounter issues while using the code! Instead, I’ve started a post on beginner friendly micro-projects. Having a go at one of these would be a great way to learn more about the code and submit some PR’s. Plus we’d love to include the projects as future tutorials if they’re done well!

The micro-projects bring together a few open code dev requests, so feel free to also have a look at posts in that category too :slight_smile:

Hey @ascillitoe, thanks for the mini-project compilation, I’ll have a go at it.
I have gone through the tutorials and I’ll be honest with you, I don’t understand them all completely. I suppose I lack the mathematical pre-req knowledge for the most part. I was hoping you could direct me within the bounds of the project “Regularisation of Polynomials”, for instance, how I can learn more about the elastic-net regularisation solver that I’ll be re-implementing. Or do you think all the tutorials are also a pre-req to this specific project, in order to understand that part of the code-base?

Hi @mHash1m!

Following @ascillitoe 's post above, I’d recommend picking up either the UCI or the turbomachinery blade data set. Then I would try to fit a polynomial (using say least squares) to the data set. This requires three classes:

  1. Parameter: Define each covariate / input parameter.
  2. Basis: Define the basis terms used in the expansion.
  3. Poly: Define the polynomial.

The class structure and input arguments for these three are given in the Documentation. Have a go at these and feel free to share your code via Google Colab or directly drop it on discourse.

.

.

2 Likes

Hi @psesh .
Thanks for the followup! I’ll try it out and get back to you on this :slight_smile:

1 Like

Hi Equadratures community! I hope you’re all doing well and good. I am Aryan Pandeya, a second-year undergraduate in the Department of Mechanical Engineering at the Indian Institute of Technology(IIT), Kanpur, India. I have been following this space for a while and started using the equadratures library for a course project in Fluid Dynamics. I recently went through the micro-projects and was really intrigued by the project on ‘Sensitivity analysis of aerofoil noise’. I have experience in coding in C and C++ and even though I’m a beginner at open-source development, I hope that I can contribute to the community in a meaningful manner. I’ll go through the relevant posts in the discourse and read the documentation and try and share my code with the community as soon as possible. I hope I can reach out to the mentors in case of any doubts or issues I face. Looking forward to an enriching learning experience!

2 Likes

Hi @Pandeya, welcome! Sounds like a plan, the aerofoil noise dataset is an interesting one, looking forward to seeing what you come up with :slight_smile: Please do get in touch if you get stuck after having a stab at it and we’ll be happy to discuss further.

Hi all!
I’m Vimarsh, a senior CS undergrad from IIT Madras, India [Github]. I have a general interest in applied stats and numerical optimization (especially iterative methods and polynomial interpolation), and am looking forward to participate in this year’s GSoC!
That being said, I found the project on optimization of the elastic-net solver to be quite interesting. A more specific question to @ascillitoe(seeing as you’re the mentor for this project) - regarding the deliverable on the analysis of elastic-net on orthogonal polynomial basis sets, is the plan to simply compare the fits for different orthogonal families (like Legendre and Chebyshev) on the same dataset? The description on the project page felt a bit open-ended to me.
Also, should I include a plan w.r.t. each deliverable when submitting the project proposal?

1 Like

Hi @vim6739! Good question! You’re right in saying the project description is quite open-ended at the moment. We’ve left a degree of flexibility in there for now as we’re open to tailoring the project to the interests and abilities of the student (to a certain extent).

With more exploratory projects such as this one it can also be difficult to set a full set of deliverables prior to starting The first deliverable will certainly be carrying out in-depth validation and profiling of the current implementation. This will involve using some of the synthetic datasets from the equadratures.datasets module, and varying the dimensions, polynomial order, and the number of active dimensions (trying different orthogonal families and index-sets is also a good idea!). Beyond that, the next deliverable will involve developing the solver to handle larger datasets. Whether that will involve rewriting sections in Fortran/C++, or more algorithmic changes, will partly depend on the results of the first deliverable.

3 Likes

Thanks for the clarification! I’ll be sure to ping you guys in case of more questions.

On a side note- since scaling to larger datasets along seems to be a priority, are there any future plans for a GPU accelerated implementation?

Sounds good, all questions welcome!

Re GPU’s we’re open to GPU acceleration where the need arises (i.e. maybe for some random sampling type solvers). However, since the very nature of polynomials means they are typically quite parsimonious with data, the need for GPU acceleration maybe isn’t as great as something like deep neural nets. @bubald has found some benefits to using multi-threading for a prototype of the elastic-net solver, so there could be some legs in exploring more extreme parallelisation via GPU’s. But my gut-feeling is that the benefits might be outweighed by overheads resulting from memory transfer in this particular case. That being said maybe something like numba could help, but again my feeling is that cython/F2py with multi-threading on a CPU is probably a good compromise, especially as the problem sizes we are looking at are usually a bit smaller than in some other areas of ML.