Google Summer of Code 2021 Projects

Hello everyone, My name is Madhu. I am an undergrad student from India. I am very interested to contribute to equadratures. Kindly guide me through the setup and development environment and some beginner-friendly issues.

Hi @antelk , I’m in the process of fleshing out a few more details on the possible projects above. For now I’ve just taken a short description of last years quadrature repository project idea for the description above.

Re more details on the project, @psesh knows more about the specifics of this project than I do, so it might be better to discuss the details with him. One thing I would say is that the GSOC projects involve half the amount of hours this year compared to previous years, so the project’s scope does have to be scaled accordingly. It might be that for the official GSOC milestones we could stick to integration over a hypercube (just to make sure its achievable within the timeframe), but then we could have a stretch goal to implement curved surface capability. A nice feature of GSOC is that the participants often remain actively involved after completion of the official project, so the stretch goal could also be worked on after the offical end date if we hadn’t quite finished it by then.

Hi @madhucharan , welcome! equadratures is on PyPI so hopefully you should be able to install with pip install equadratures. If you are wanting to have a go with editing the code you could of course clone the git repo and then do pip install -e <equadratures repo>.

With regards to beginner-friendly issues, if you haven’t already I’d perhaps start by going through a few of the tutorials to get a feel for the code. These are actually being updated this month, and as a matter of fact improving these further is a possible project idea for this year, so it would certainly be helpful to familiarise yourself with them. The tutorial source is currently in the feature_docs branch of equadratures, so if you find any old deprecated code please feel free to fix and submit PR’s!

If you run into any problems and need more help, please do post in the below category and we’ll be happy to assist!

Hi all, I’ve updated the github wiki with our final project ideas. Do check them out!

2 Likes

Hi all this is Hashim.
I’m a Senior Year Undergrad student from Pakistan. My interests lie in Data Science and Machine Learning hence the the last project idea in the above list caught my attention :smiley: . I have experience working in Python and C++. I have worked with statistics involved in the Data Science workflow, and implemented multiple models. So, I do understand the project gist to a good extent. I would love to work on the project and contribute to Equadratures. I’d love to start off with any “codebase-beginner” PR’s to contribute right away(there’s only just 1 issue and I asked to work on that so I’m open to other suggestions). In the mean time, I’ll be going through the documentation and tutorials.
Best

2 Likes

Hi @mHash1m and others, apologies for taking a while to get back to you with ideas for some possible beginner friendly contributions.

We don’t actually have any open Github issues at the moment, but please do feel free to open your own if you encounter issues while using the code! Instead, I’ve started a post on beginner friendly micro-projects. Having a go at one of these would be a great way to learn more about the code and submit some PR’s. Plus we’d love to include the projects as future tutorials if they’re done well!

The micro-projects bring together a few open code dev requests, so feel free to also have a look at posts in that category too :slight_smile:

Hey @ascillitoe, thanks for the mini-project compilation, I’ll have a go at it.
I have gone through the tutorials and I’ll be honest with you, I don’t understand them all completely. I suppose I lack the mathematical pre-req knowledge for the most part. I was hoping you could direct me within the bounds of the project “Regularisation of Polynomials”, for instance, how I can learn more about the elastic-net regularisation solver that I’ll be re-implementing. Or do you think all the tutorials are also a pre-req to this specific project, in order to understand that part of the code-base?

Hi @mHash1m!

Following @ascillitoe 's post above, I’d recommend picking up either the UCI or the turbomachinery blade data set. Then I would try to fit a polynomial (using say least squares) to the data set. This requires three classes:

  1. Parameter: Define each covariate / input parameter.
  2. Basis: Define the basis terms used in the expansion.
  3. Poly: Define the polynomial.

The class structure and input arguments for these three are given in the Documentation. Have a go at these and feel free to share your code via Google Colab or directly drop it on discourse.

.

.

2 Likes

Hi @psesh .
Thanks for the followup! I’ll try it out and get back to you on this :slight_smile:

1 Like

Hi Equadratures community! I hope you’re all doing well and good. I am Aryan Pandeya, a second-year undergraduate in the Department of Mechanical Engineering at the Indian Institute of Technology(IIT), Kanpur, India. I have been following this space for a while and started using the equadratures library for a course project in Fluid Dynamics. I recently went through the micro-projects and was really intrigued by the project on ‘Sensitivity analysis of aerofoil noise’. I have experience in coding in C and C++ and even though I’m a beginner at open-source development, I hope that I can contribute to the community in a meaningful manner. I’ll go through the relevant posts in the discourse and read the documentation and try and share my code with the community as soon as possible. I hope I can reach out to the mentors in case of any doubts or issues I face. Looking forward to an enriching learning experience!

2 Likes

Hi @Pandeya, welcome! Sounds like a plan, the aerofoil noise dataset is an interesting one, looking forward to seeing what you come up with :slight_smile: Please do get in touch if you get stuck after having a stab at it and we’ll be happy to discuss further.

Hi all!
I’m Vimarsh, a senior CS undergrad from IIT Madras, India [Github]. I have a general interest in applied stats and numerical optimization (especially iterative methods and polynomial interpolation), and am looking forward to participate in this year’s GSoC!
That being said, I found the project on optimization of the elastic-net solver to be quite interesting. A more specific question to @ascillitoe(seeing as you’re the mentor for this project) - regarding the deliverable on the analysis of elastic-net on orthogonal polynomial basis sets, is the plan to simply compare the fits for different orthogonal families (like Legendre and Chebyshev) on the same dataset? The description on the project page felt a bit open-ended to me.
Also, should I include a plan w.r.t. each deliverable when submitting the project proposal?

1 Like

Hi @vim6739! Good question! You’re right in saying the project description is quite open-ended at the moment. We’ve left a degree of flexibility in there for now as we’re open to tailoring the project to the interests and abilities of the student (to a certain extent).

With more exploratory projects such as this one it can also be difficult to set a full set of deliverables prior to starting The first deliverable will certainly be carrying out in-depth validation and profiling of the current implementation. This will involve using some of the synthetic datasets from the equadratures.datasets module, and varying the dimensions, polynomial order, and the number of active dimensions (trying different orthogonal families and index-sets is also a good idea!). Beyond that, the next deliverable will involve developing the solver to handle larger datasets. Whether that will involve rewriting sections in Fortran/C++, or more algorithmic changes, will partly depend on the results of the first deliverable.

3 Likes

Thanks for the clarification! I’ll be sure to ping you guys in case of more questions.

On a side note- since scaling to larger datasets along seems to be a priority, are there any future plans for a GPU accelerated implementation?

Sounds good, all questions welcome!

Re GPU’s we’re open to GPU acceleration where the need arises (i.e. maybe for some random sampling type solvers). However, since the very nature of polynomials means they are typically quite parsimonious with data, the need for GPU acceleration maybe isn’t as great as something like deep neural nets. @bubald has found some benefits to using multi-threading for a prototype of the elastic-net solver, so there could be some legs in exploring more extreme parallelisation via GPU’s. But my gut-feeling is that the benefits might be outweighed by overheads resulting from memory transfer in this particular case. That being said maybe something like numba could help, but again my feeling is that cython/F2py with multi-threading on a CPU is probably a good compromise, especially as the problem sizes we are looking at are usually a bit smaller than in some other areas of ML.

There might be some scope in using GPUs for accelerating some of the linear algebra aspects of the code, and it was something I’d planned to look into the future as it could definitely be worth exploring.

Some consideration would need to be given towards future scalability and code maintainability. Ideally we’d want to try and avoid duplicating code for different architectures. This may be through the use of a templating library such as Mako (pyfr is a good example that utilises it) or through a more generic library such as ArrayFire which allows for heterogeneous computing. Although I wouldn’t dismiss any thoughts regarding the use of some of the more standard CUDA libraries such as PyCUDA or CuPy, or cuda via Numba as @ascillitoe mentioned.

In fact, the CuPy approach would seem rather promising due to their work in matching a significant part of the numpy/scipy api which would make it easy to consider it as a drop in replacement for some aspects of the code and to run benchmarks. If it can be used as a drop-in replacement for the parts of the code that we want to accelerate, we might be able to use logicals to switch between numpy and CuPy imports depending on whether there is a GPU present and also with synchronize statements. I haven’t used these directly, but we’d first need to determine a candidate class/method to test and benchmark, @ascillitoe and @psesh may be able to advise.

We could potentially also move some of the data generation itself onto the GPU to reduce the amount of data passing back and forth. If you happen to have any thoughts based on a peripheral scan of the code or your experience and could note it down in a post, we could discuss the feasibility/best route to approach.

1 Like

Hi Equadratures community!
I’m Xin, a second-year PhD student from Delft University of Technology, Netherlands. My research is about efficient deep learning and computer vision. I have been following this space for a while and started getting familiar with equadratures. I found the project “Regularisation of Polynomials” to be super interesting and fitting my current experience well. Thanks for providing the micro-projects. I picked up the UCI dataset to play with the elastic-net regularisation solver. I am not sure whether we are required to submit some PR’s or Google Colab code alongside the project proposal for the application?
Thanks!

1 Like

Hi @SylviaX! Great to hear about your interest in equadratures and the projects. Re the PR’s here are the GSOC guidelines for the proposal: Writing a proposal | Google Summer of Code Guides. Having a few PR’s listed in GSOC proposals has become quite common for students (and some organisations require it), but as far as I am aware there isn’t a strict requirement from Google for having any (Google also review the proposals, we make the final decision on which proposals to accept depending on how many slots Google gives us).

On Google’s side, they look for quality proposals, which should demonstrate that the student has taken the time to understand the project and has the appropriate skills/interests for it. From our point of view, we don’t specifically require PR’s, and would rather receive high-quality applications where the student has taken the time to really think about the project, rather than just submit a few very minor PR’s. That being said, PR’s can be a good way to demonstrate open source coding experience and interest in the project.

Happy to discuss further if you have more queries or ideas! :slight_smile:

1 Like

Hi all,
I am Simardeep Singh Sethi, a 3rd year student from Guru Tegh Bahadur Institute of Technology, Delhi, India. I have a great interest in Machine Learning, Artificial Intelligence . I am excited to increase my knowledge of statistics and apply them by contributing to Equadratures. I have gone through the tutorials and documentation present on discourse and started working on the micro-project Sensitivity analysis of aerofoil noise. I have read about sobol-indices and gone through the blog provided, Regularisation 1: Handling real-world data to get a better understanding of the model interpretation using sobol indices. I have gone through a lot of research papers but was not able to find a formula for sound pressure levels in terms of the given parameters. Hope you can help with this.

1 Like

Hi @Simardeep27! For discussion on the aerofoil noise micro-project, its probably best if we discuss over on the relevant post here, so I’ll answer your query there. Also, feel free to create your own post on it once with questions or progress.

P.s. Thanks for this PR!

@psesh and @Nick know more about the distributions than I do so I shall wait for them to review.

1 Like