An introduction to test suite optimization in Ruby

railsNovember 21, 2023Dotby Samuel de Moura

Preface

The Ruby on Rails community is well known for its attitude towards testing, and this is understandably seen as a very positive thing: most software engineers will probably agree that automated testing is a practical necessity for building robust, reliable, and maintainable applications, especially in a dynamic language like Ruby.

As an application grows though, so does the amount of tests required to cover all the functionality in a satisfactory manner. And as your test suite grows, so does its execution time - which means CI/CD takes longer, your costs go up and your developers are more prone to context-switching since they have to keep tabs on CI pipelines on their PRs. This is especially true with system tests which, although very effective at catching lots of types of bugs, can be flaky and slow when not written carefully.

However, it doesn't have to be that way! To prove it, we'll go over some ways to tackle reliability and performance issues in your average Ruby on Rails application's test suite in this blog post.

What are system tests?

If you've ever worked on a RoR application before, you're likely familiar with what the official documentation calls system tests - also known as feature specs, Capybara specs, E2E tests, or whatever else your team might call them.

Generally, these are tests that exercise one or more user flows in your application by interacting with it through the same interface that your end users will use - as opposed to a test that, for example, focuses on exercising an internal application API and asserting that its outputs are correct. In practice, this usually means running a full web browser and simulating mouse and keyboard input on the pages, then looking for some content on the DOM to assert that everything worked as expected.

System tests offer great bang for your buck: there's a whole ecosystem of very mature gems that make them easy to write, and they can give you lots of code coverage very quickly. On the other hand, they can be tricky to write well: if you have flaky and/or slow specs that make waiting for CI to pass an unpleasant experience, it's very likely that the culprit will be your system tests.

Fortunately, there are things you can do to make sure that your system tests don't cause more pain than they actually need to.

My tests are so slow!

After a codebase has grown enough over time, this is usually the number one complaint developers have about their CI/CD setup. Nobody dares run the test suite locally because who knows how long that'll take, and "waiting for CI to pass so we can merge" actually becomes a common blocker for other tasks that should theoretically be completely unrelated.

The "easy" fix: test parallelization

Hardware is comparatively cheap and engineering time is expensive, so a very good first option is throwing more hardware at the problem.

Although your test runner might have multiple CPU cores, by default, a Ruby on Rails application running Capybara specs will only make effective use of a single core, so using a gem to parallelize test execution across CPU cores is often an effective initial step. This can be done via gems like parallel_tests, turbo_tests, and flatware on older versions of Rails (or if you're using RSpec), and as of version 6, this is a built-in feature that can be enabled by editing test_helper.rb like this when using Minitest:

class ActiveSupport::TestCase
  parallelize(workers: :number_of_processors)
end

That does entail spinning up multiple databases for each core though, which can unveil bugs due to race conditions and other similar pitfalls, so while it is an effective measure, it may not be a painless drop-in fix. AppSignal has written a great blog post about some of the troubles you can run into while parallelizing your tests.

However, this approach can only take you so far. Spinning up a test runner will always cause some overhead, and the more of them you need for your CI pipelines, the more room you have for something to go wrong, and the more the overhead eats into your performance "boost". At some point, you'll need to look into actually optimizing your tests.

The "hard" fix: test optimization

If tests are slow on CI, the developers on the team probably already have several hunches about what the causes are. Maybe our factories are slowing us down because the models are too complex and there are too many relationships. Maybe spec/rails_helper.rb is doing too much stuff during setup or teardown between tests.

Probably the worst thing you can do at this point is attempt to tackle those causes. Most of the time, unless we have a very intimate understanding of the framework(s) we're using and the hardware it's running on, our guesses about which pieces of code are causing performance bottlenecks are no more accurate than a coin flip.

Instead, the first step when attempting to optimize something should usually be to measure it. For this, we can use a profiler like StackProf. There are several different profilers for Ruby and the ideas here apply to any of them, but I'll be referencing this gem specifically for this blog post.

Generating and investigating flamegraphs

After installing StackProf, the first order of business is to collect raw profiles that we can use to generate flamegraphs later.

Profiling the entire test suite is usually neither feasible nor necessary: we can profile just a few specific tests which are representative of our "slow set", and that'll likely be enough to expose obvious performance bottlenecks if they exist.

We can do this by, for example, wrapping up our test executions like this:

# Somewhere in spec/spec_helper.rb
RSpec.configure do |config|
  config.around(:each) do |example|
    test_name = example.full_description.parameterize
    path = Rails.root.join("tmp/stackprof-#{test_name}.dump")

    StackProf.run(mode: :wall, raw: true, out: path.to_s) do
      example.run
    end
  end
end

This will create a separate profile for each test. Or we can bundle multiple tests up into one profile:

RSpec.configure do |config|
  config.before(:suite) do
    StackProf.start(mode: :wall, raw: true, out: 'tmp/stackprof-profile.dump')
  end

  config.after(:suite) do
    StackProf.stop
    StackProf.results
  end
end

Neither approach is objectively better than the other, but I prefer bundling up a few tests into the same profile. That way common patterns jump out to me more clearly, versus having to compare different profiles and spot patterns between them.

Either way, regardless of how you prefer to run StackProf, a couple of things are worth keeping in mind:

  • mode: :wall means that sampling will happen based on wall clock time, as opposed to the suggested mode: :cpu from the StackProf README which will sample based on CPU cycles.
    • This is important because, given the nature of system tests, which interact with several external components (database(s), a web browser, caches, external APIs), it's very likely that our performance bottleneck has nothing to do with CPU cycles spent within the Rails application.
    • For example: when using mode: :cpu, a long unoptimized database query will not show up in the collected profile because it's not taking CPU cycles within our codebase, since we're just waiting for a response from the database.
  • raw: true tells Stackprof to write out extra data, necessary for generating flamegraphs.

After collecting one or more profiles, you can then load them up into a flamegraph visualizer. There are many of these available, but a personal favorite of mine is Speedscope. It's a browser app that takes in JSON files, so we'll need to convert our files first:

stackprof tmp/stackprof-profile.dump --json > tmp/stackprof-profile.json

After loading up a JSON file into Speedscope, you should be greeted by a flamegraph. By default, they look something like this:

A StackProf profile spanning multiple feature specs, displayed through the Speedscope web app.
A StackProf profile spanning multiple feature specs, displayed through the Speedscope web app.

There are a few different ways to visualize the data, all of which are described in the documentation. If you're still feeling lost after reading through the documentation, Paweł Świątkowski has a great blog post where he goes through and debugs a StackProf profile using Speedscope in a real-world application, which should give you an idea of how to go about investigating your own profiles. We'll also briefly go over a concrete example below.

Common pitfalls

Unfortunately, every codebase is different and there's no perfect step-by-step guide you can follow to easily optimize it. However, when talking about Ruby on Rails applications specifically, there are usually a few things that show up pretty regularly, so let's talk about some of them in hopes they spark some ideas of things to be on the lookout for:

Excessive usage of sleep

You should not need to sleep anywhere in your tests. If the test becomes flaky or fails outright when the calls to sleep are removed, chances are:

  • You need to write a helper to wait for the specific condition that the test depends on, instead of hoping that waiting for X seconds will always result in that condition being fulfilled; or
  • A bad Capybara matcher is being used, creating unnecessary flakiness that the sleep call is then countering. There's a great post in Gusto's engineering blog called "Eliminating Flaky Ruby Tests", and there's a section called "Dangerous Capybara Wait Matchers" there. It's a very useful read to learn about this problem specifically.

As an aside, sometimes one does indeed run into some bizarre edge case where not even hours of investigation will elucidate what the cause of the flakiness is. In a situation where you need to move fast, throwing in a sleep there isn't the worst thing in the world, but it should be done very sparingly.

Slow but commonly-used helpers

This one is a great issue to have because fixing it usually results in drastic speedups.

A concrete example: consider an application where most pages are hidden behind authentication. In other words, the majority of tests need to sign in as a user before the relevant part of the test can actually begin executing.

An obvious approach is to have a sign_in helper which does this. It might navigate to the login page, fill out the form, submit it, and wait for a redirect. This sounds fine when taken at face value considering we're writing system tests, where we want the test execution to resemble a user's journey through the application. However, the time spent doing this adds up very quickly across your test suite. Instead, the sign_in helper should ideally bypass this flow entirely by automatically setting a valid user cookie or something of the sort, and the proper login flow should be tested as a separate spec.

This pattern is usually pretty easy to spot when looking at a flamegraph that contains multiple test executions because there will be noticeable repetition across specs:

The same profile as before, but with "sign_in" blocks highlighted.
The same profile as before, but with "sign_in" blocks highlighted.

Applying the optimization described above, we went from spending ~12 seconds total signing in:

before optimization

To ~2 seconds:

after optimization

And that's just for a few specs! Across the entire test suite, this can shave off minutes or tens of minutes.

Animations

Not a lot to say about this one. Animations are great when you're actually using the application, but during tests, they slow things down unnecessarily. Thankfully, if you're using Capybara 3.2.0 or above, all you need to do is stick:

Capybara.disable_animation = true

Into rails_helper.rb. Although usually not a big win, it's still an easy win.

General CPU-bound work

If you're getting to this point, it's usually a sign that you've already optimized away most of the unnecessary bottlenecks, and now you're down to work that needs to be done. If that's the case, likely you're looking at things that are affecting not only your tests but also production performance.

In case there is a need to optimize general application performance, that's when you'd be looking into things like improving your cache hit rates, GC tuning, and so on, which are out of the scope of this post. For that kind of work, I recommend going through "The Complete Guide to Rails Performance" by Nate Berkopec, which covers these points and a lot more.

So there you have it! Hopefully, this post shed light on the usual pitfalls one runs into as a Rails application's test suite grows over time, while also providing some useful resources for tackling them.

Closing Remark

Could your team use some help with topics like this and others covered by ShakaCode's blog and open source? We specialize in optimizing Rails applications, especially those with advanced JavaScript frontends, like React. We can also help you optimize your CI processes with lower costs and faster, more reliable tests. Scraping web data and lowering infrastructure costs are two other areas of specialization. Feel free to reach out to ShakaCode's CEO, Justin Gordon, at justin@shakacode.com or schedule an appointment to discuss how ShakaCode can help your project!
Are you looking for a software development partner who can
develop modern, high-performance web apps and sites?
See what we've doneArrow right