Coming up with a work plan

In line with the timeline agreed upon by my mentors, I started my work with Numba implementation on stats. After going through different resources(both video and official documentation), the following method of work was adopted by me:

  • Profile the given function on centered_school and a custom random dataset in a jupyter notebook.
  • Identify bottlenecks.
  • Try to implement a given numpy or scipy function from scratch in python and jit it.
  • Compare their respective times.
  • Put the modified methods in the primary method and then compare it’s speed to the one included in ArviZ.

Here is the link to my current pull request which containes in detail the changes made and the execution times of each function(both pre-defined and custom) for reference in an IPython Notebook.

Here is the gist of my observations:

Method: hpd

Bottleneck:

scipy.circmean

Performance of hpd_new with jitted circmean:

Similar performance on both datasets, sometimes slower on the custom dataset.

Method 2: r2_score

Bottleneck:

np.var

Performance of r2_score_new with jitted _var_1d and _var_2d_methods:

Reasonable speedup, almost 50% faster.

Method 3: waic

Bottleneck:

np.var

Workaround

replaced np.var by _var_1d and _var_2d(custom functions)

Reasonable speedup, a gain of 100ms on the custom dataset

Method 4: _gpdfit

Bottlenecks:

np.log1p and np.exp

Workaround and performance:

Experimented with jitted log_1p and exp functions. Their performance was similar to or worse than their numpy counterparts. The _gpdfit_new(replace np.log_1p with jitted log_1p) method did not give any definite speedup.

Method 5: _gpinv

Bottlenecks:

np.log1p and np.expm1

Workaround:

Experiment with jitted expm1. The performance was similar ~ worse than numpy.expm1. The _gpinv _new(replaced np.log1p by jitted log_1p) also did not give any definite speedup.

Method 6: psislw

Bottlenecks:

logsumexp,_gpinv,_gpdfit

Workaround:

Replaced _gpinv and _gpdfit with _gpinv_new and _gpdfit_new. Not much improvement compared to ArviZ psislw. Some scope of improvement still present as logsumexp, which is a part of diagnostics could be further enhanced.

Method 7: loo

Bottlenecks:

psislw, logsumexp

Workaround:

Replaced psislw with psislw_new. An average of 0.1 s gained on the large custom dataset. Almost similar performance on schools. Some scope of improvement still present as logsumexp, which is a part of diagnostics could be further enhanced.

Method 8: compare

Bottlenecks:

Loo and Waic

Workaround

Replaced loo and waic in compare with loo_new and waic_new. waic_new gave a gain of 2-3 microseconds; loo_new gave similar performance as original loo. Jitting the different functions under stacking, pseudo-bma and bb-pseudo bma slowed down the code substantially.

Method 9: _ic_matrix

Changes Made

Used numba to lift the loop in _ic_matrix. This change slowed down the overall process by approx 100ms.

Method 10: summary

Majority of summary involves diagnostics. Therefore I’ll see it’s performance improvement after I start working on diagnostics.

Next Step

As mentioned above, I am done with arviz.stats. As of writing this post, I am halfway through arviz.diagnostics and arviz.stats_utils. After completing the entire stats module, my mission is to set suitable benchmarks for the Arviz project and write down the required tests to verify the numba speedups.