Numba and Stats
Coming up with a work plan
In line with the timeline agreed upon by my mentors, I started my work with Numba implementation on stats. After going through different resources(both video and official documentation), the following method of work was adopted by me:
- Profile the given function on centered_school and a custom random dataset in a jupyter notebook.
- Identify bottlenecks.
- Try to implement a given numpy or scipy function from scratch in python and jit it.
- Compare their respective times.
- Put the modified methods in the primary method and then compare it’s speed to the one included in ArviZ.
Here is the link to my current pull request which containes in detail the changes made and the execution times of each function(both pre-defined and custom) for reference in an IPython Notebook.
Here is the gist of my observations:
Method: hpd
Bottleneck:
scipy.circmean
Performance of hpd_new with jitted circmean:
Similar performance on both datasets, sometimes slower on the custom dataset.
Method 2: r2_score
Bottleneck:
np.var
Performance of r2_score_new with jitted _var_1d and _var_2d_methods:
Reasonable speedup, almost 50% faster.
Method 3: waic
Bottleneck:
np.var
Workaround
replaced np.var by _var_1d and _var_2d(custom functions)
Reasonable speedup, a gain of 100ms on the custom dataset
Method 4: _gpdfit
Bottlenecks:
np.log1p and np.exp
Workaround and performance:
Experimented with jitted log_1p and exp functions. Their performance was similar to or worse than their numpy counterparts. The _gpdfit_new(replace np.log_1p with jitted log_1p) method did not give any definite speedup.
Method 5: _gpinv
Bottlenecks:
np.log1p and np.expm1
Workaround:
Experiment with jitted expm1. The performance was similar ~ worse than numpy.expm1. The _gpinv _new(replaced np.log1p by jitted log_1p) also did not give any definite speedup.
Method 6: psislw
Bottlenecks:
logsumexp,_gpinv,_gpdfit
Workaround:
Replaced _gpinv and _gpdfit with _gpinv_new and _gpdfit_new. Not much improvement compared to ArviZ psislw. Some scope of improvement still present as logsumexp, which is a part of diagnostics could be further enhanced.
Method 7: loo
Bottlenecks:
psislw, logsumexp
Workaround:
Replaced psislw with psislw_new. An average of 0.1 s gained on the large custom dataset. Almost similar performance on schools. Some scope of improvement still present as logsumexp, which is a part of diagnostics could be further enhanced.
Method 8: compare
Bottlenecks:
Loo and Waic
Workaround
Replaced loo and waic in compare with loo_new and waic_new. waic_new gave a gain of 2-3 microseconds; loo_new gave similar performance as original loo. Jitting the different functions under stacking, pseudo-bma and bb-pseudo bma slowed down the code substantially.
Method 9: _ic_matrix
Changes Made
Used numba to lift the loop in _ic_matrix. This change slowed down the overall process by approx 100ms.
Method 10: summary
Majority of summary involves diagnostics. Therefore I’ll see it’s performance improvement after I start working on diagnostics.
Next Step
As mentioned above, I am done with arviz.stats. As of writing this post, I am halfway through arviz.diagnostics and arviz.stats_utils. After completing the entire stats module, my mission is to set suitable benchmarks for the Arviz project and write down the required tests to verify the numba speedups.