testing as debugging

Thoughts on writing my first tests.

Charles T. Gray https://twitter.com/cantabile
07-01-2018

I’m writing my first tests at the moment.

I’ve been rather enamoured with the idea of testing for about six months now, but haven’t had much of a chance to work on research.

So, I’m finally now sitting down to begin my first package in earnest, and thus writing my first tests. Seeing as I had a request, I thought I’d write my thoughts at this fairly early point. I must confess, however, that this post wandered around a bit and I’m not sure how useful John would find it, alas; it’s not really an explanatory post, more of an exploration. For actually useful information about getting started with testing, I recommend R packages.

I recently saw Kara Woo speak at R-Ladies Seattle, and she made a compelling case for debugging with rigour. Admittedly I’ve only spent one evening reading about that before getting distracted by the next shiny thing (i.e., hyperventilating into a figurative paper bag as useR! 2018 approaches.). However, I came across this wonderful gem that has me alight with newfound testing enthusiasm:

“If you’re using automated testing, this is also a good time to create an automated test case. If your existing test coverage is low, take the opportunity to add some nearby tests to ensure that existing good behaviour is preserved. This reduces the chances of creating a new bug.” (Advanced R)

Now, don’t get me wrong, Kara’s inspired me to eventually get across the whole debugging gambit. But it is high conference season, and I’ve got some simulation problems to work out, so I can’t justify spending an overabundance of time on git.

Adopting this idea into my workflow is enough for now; with this one tweak, my workflow for debugging feels transformed. It also solves the problem I had with figuring out when and how to test. I love the idea of taking this as an opportunity to increase testing coverage, whilst also assising in drilling down into which line of code.

It’s also great for those moments where I’m feeling stuck or uninspired for a moment. Answer the question, does this work as I think it should?

In this post I aim to outline an example from my work to illustrate the elements of a test:

At the end there is a little mathematical promenade I went on while working on this post. I find it useful to see other people’s learning process, and I didn’t want to just toss my notes away.

A little example from what I’m working on

So, one of the things I need to code up properly at the moment is a series of densities with parameters estimated from summary statistic quantiles. I’m interested in comparing estimators derived from different distributions and how well they approximate the true density evaluated at the median.

This came up last night; I sat down to work and I had in my head that the exponential estimator needed debugging. So, I thought this was a perfect time to practice testing as debugging. Even if it does work as should, I will have increased testing coverage and not wasted my time. I will also feel both reassured and just a little bit smug that I levelled up my gitflow.

Consider an exponential distribution \(\exp(\lambda)\) with rate parameter \(\lambda > 0\). It can be shown1 that the median \(\mu\) of this distribution is \[ \mu := \frac {\log2}{\lambda}; \] so, we can think of \(\lambda\) in terms of \(\mu\): \[ \lambda = \frac{\log 2}{\mu}. \]

I’d written a function exp_est the other day that I was pretty sure took a median \(\mu\) as an argument, estimated \(\lambda\) as above, and evaluated the exponential density \(g\) at \(\mu\) with \(\lambda\) estimated from the given median.

So, given an argument, some observed median \(\mu\), we have the density evaluated at the median approximated thus \[ g(\mu; \lambda \approx \log 2/\lambda) \]

This calculation is wrapped up in the est_exp function. Seeing as I am just using this for the comparison, I’m not making this an exported funciton.

Granted, this is only a couple of lines of code, so it can be eyeballed to make sure it’s commensurate with the mathematics2.

Expectations

Firstly, I wanted to ensure that the calculataion worked for the default parameters exp_est(qexp(1/2)). So, I expect that exp_est(qexp(1/2)) is equal to the density dexp evaluated at the median (calculated qexp(1/2))3.
We can do that by writing expectation. I believe the most common test is expect_equal.

but if I try this

in the console I see this:


Error : 2 not equal to 4.
1/1 mismatches
[1] 2 - 4 == -2

So, I suppose this shows how the function expect_equal works, it checks to see if their difference is 0.

I tried to think about a few ways it could go wrong. So, I tried specifying and not specifying the rate parameter, as well as checking a \(0 < \lambda < 1\) value and a \(\lambda > 1\) value.

Tests

For multiple tests that are alike in some way, in this case the same function, we can group them together into a test, a wrapper function testthat::test_that that takes in multiple expectations.

We group multiple tests in a file. For this type of stuff, where it wanders away from mathematics completely and focuses on the dev side of what we do, I tend to make good use of helper functions to automate as much as possible.


test_that("exponential estimator produces what it should", {
  expect_equal(exp(est(qexp(1/2)), dexp(qexp(1/2))))
  expect_equal(exp_est(qexp(1/2)), dexp(qexp(1/2), log(2) / qexp(1/2)))
  expect_equal(exp_est(qexp(1/2, rate = 3)), dexp(qexp(1/2, rate = 3), log(2) / qexp(1/2, rate = 3)))
  expect_equal(exp_est(qexp(1/2, rate = 1/5)), dexp(qexp(1/2, rate = 1/5), log(2) / qexp(1/2, rate = 1/5)))
})

Syntax

Curiously, we do not separate the different tests with ,, as an R coder of my ability expected. I separate mine with new lines, as all the examples do.

I just experimented in the name of blogging posterity, and discovered that putting two tests on the same line with no space produced a parsing error that began thus:


Error in parse(textConnection(lines, encoding = "UTF-8"), n = -1, srcfile = srcfile,... 

So it is not only easier to read, but is necessary to separate tests by new lines.

testthat makes testing fun!

There is a serious joy to running all tests (control + shift + t) and seeing that happy output:

Every time I hear Strongbad’s voice in my head saying “I make testing fun!”.

Teaching myself maths as I go

My undergraduate degree was very light in analysis and calculus, which come up a lot in statistics. Most of what I focussed on was discrete mathematics. I find this helps immensely (even more than you’d think) with scientific programming, especially this new-fangled functional programming.

I was a piano teacher for twenty years, and am by no means some undiscovered mathematical genius. I learn mathematics by painfully slowly unpicking definitions and playing around with them.

The following was for my own edification, probably has errors, certainly has fudgey bits, but was undeniably helpful getting my head back into a mathematical place.

I believe it’s useful for people to see the learning process, that it’s not magic, it’s just study.

My blog is also intended to be a learning diary of sorts for myself, so I think it’ll be interesting to see how my understanding of concepts change and develop.

Find the median of the exponential distribution

We want to show that the median \(\nu\) of the exponential distribution \(\exp(\lambda)\) with rate parameter \(\lambda > 0\) is given by \[ \nu = \frac{\log2}{\lambda}. \]

Now, we know that for any given pdf \(f\) we have a cdf \(F\) providing the probability that \(X\) takes up to a certain value \(x\), \[ F(x) = P(X \leqslant x) = \int_0^x f(x)\, dx. \]

So, where \(F(x) = 1/2\), we have \(x = \nu\). Thus, solving \(F(x) = 1/2\) for \(x\) will provide us with the median \(\nu\).

Now, since \(X \sim \exp(\lambda)\), we have \[ f(x) := \begin{cases} \lambda e^{- \lambda x} & x \geqslant 0\\ 0 & x < 0 \end{cases} \] with \(\lambda > 0\).

Thus, we need to solve for \(x\), \[ \int_0^x \lambda e^{- \lambda x} \, dx = \frac 1 2. \]

So, if I can find \(\int e^{ax}\, dx\) for some \(a > 0\), then I am most of the way to solving this problem.

Something I love about the exponential function is its series definition, \[ e^x = \exp(x) = \sum_{k = 0}^\infty \frac{x^k}{k!} = 1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \dots \] (In particular, I like this definition because it helps me understand Euler’s formula \(e^{ix} = \cos x + i \sin x\), using the series definitions of sine and cosine.)

Since this is a polynomial, we can differentiate termwise, \[ \frac{de^x}{dx} = 0 + 1 + \frac{2x}{2} + \frac{3x^2}{6} \dots = 1 + x + \frac{x^2}{2} + \dots \] thus giving the lovely result, \(\frac{de^x}{dx} = e^x\). I love how intuitive this is in series form. I’d completely forgotten it. Once gets so used to \(\frac{de^x}{dx} = e^x\), that one forgets why. So many things in maths are like that for me. And then I find it so irritating that I have to go through it all again.

Now, starting with the chain rule we have

\[\begin{align*} \frac{de^{ax}}{dx} & = \frac{de^{ax}}{d(ax)} \cdot \frac{dax}{dx} = ae^{ax} \\ \implies \int \frac{de^{ax}}{dx} & = a \int e^{ax}\, dx. \end{align*}\]

To know how to handle integrating a derivative, we turn to the fundamental theorem of calculus, which states that if \(f\) is a real-valued function on a closed interval \([a,b]\), then \(F(x) = \int_a^b f(x)\, dx\) which gives \(F'(x) = f(x)\). Or something, hm, perhaps that’s not quite right. It’s some fundamental application of calculus that gives us \(\int \frac{d}{dx}f(x) = f(x)\).

Anyway, we now have

\[\begin{align*} \int \frac{de^{ax}}{dx} & = a \int e^{ax}\, dx \\ \implies e^{ax} & = a \int e^{ax} dx. \\ \int e^{ax} dx & = a^{-1}e^{ax}.\ (*) \end{align*}\]

I think I’m glossing past some assumptions about this only working for definite integrals, but when we actually apply it we will on definite integrals, so I can live with this fudgery as we now have the key to the problem.

Returning to our original problem, then, we have

\[\begin{align*} \int_0^{x} \lambda e^{-\lambda x}\, dx & = \frac{1}{2} \\ \implies \lambda \int_0^{x} e^{-\lambda x}\, dx & = \frac{1}{2} \\ \implies \lambda((-\lambda)^{-1} e^{-\lambda x} - (-\lambda)^{-1}) & = \frac 1 2\\ \implies - e^{-\lambda x} + 1 & = \frac{1}{2} \\ \implies e^{- \lambda x} & = \frac 1 2 \\ \implies - \lambda x & = \log(1/2)\\ \implies x & = \log 2 / \lambda. \end{align*}\]

And since this came from the assumption \(F(x) = 1/2\), we now have that the median \(\nu\) of the exponential distribution is given by \(\nu = \log2/\lambda\), as required. :)


  1. This actually bugged me all week. So, I’ve added my mathematical musings as the final section. I

  2. I love converting my code back to maths as a lowfi debugging practice.

  3. I like to make calculations in fractions to remind myself that I’m dealing with standard quantiles, such as the median or quartiles.

Citation

For attribution, please cite this work as

Gray (2018, July 1). measured.: testing as debugging. Retrieved from https://fervent-hypatia-7b7343.netlify.com/posts/2018-11-12-testing-as-debugging/

BibTeX citation

@misc{gray2018testing,
  author = {Gray, Charles T.},
  title = {measured.: testing as debugging},
  url = {https://fervent-hypatia-7b7343.netlify.com/posts/2018-11-12-testing-as-debugging/},
  year = {2018}
}