Detect & Fix Performance Issues Using Tracetest

Detect & Fix Performance Issues Using Tracetest
Jul 6, 2022
4 min
read
Sebastian Choren
Software Engineer
Tracetest

Tracetest can spot anomalies in code before users or developers encounter them. Learn more about how to use it in a Test-Driven Development (TDD) flow.

Share on Twitter
Share on LinkedIn
Share on Reddit
Share on HackerNews
Copy URL

Table of Contents

Get started with Tracetest!

Try Managed Tracetest Today!

We recently [wrote about how we started using Tracetest to test itself](https://kubeshop.io/blog/integrating-tracetest-with-github-actions-in-a-ci-pipeline) and how we integrated it in a CI pipeline. Using our own product was a great way to find bugs and usability issues, but it can do much more. In fact, Tracetest can spot anomalies in code before users or developers encounter them.

For example, I was working on testing the instrumentation of the Tests List endpoint, using Tracetest to assert the instrumentation expectations. I was using the UI to decide which spans needed to be checked and what metric would be useful, when I noticed that the Trace Viewer was showing **a lot** of database queries. Why would we need that many queries for a simple listing? 

I decided to use [Tracetest](http://tracetest.io) to help me find the issue and fix it.

## Detecting the Issue via OpenTelemetry Trace

Below is the trace for a simple listing query. It fetches a list of available tests and returns it in JSON format.  

I expected to see just one query to resolve this. But, after running the tests a few times, I noticed that we had this trace:

It wasn’t even complete. There were so many queries that they didn’t fit on a single screen. 

I wanted to find out exactly how many we had, so instead of manually counting them, I used the selectors to see the match count:

There were 10 queries to resolve a very simple list! 

That outnumbered the tests we had in the listing, so I suspected that we were doing a select-tests query, and then one query to get the details of each test. We needed to fix this to not be so cumbersome. But, I also want to make sure that my changes actually fixed the issue without breaking anything else.

## Test-Driven Development

I used Test-Driven Development (TDD) in this process. 

### Step 1: Add an Assertion

First, I added an assertion to check that the count is just 1. After all, we could find a way to fetch that info in a single query.


Since I am a backend developer and much more comfortable with a text editor than a graphical UI, I decided to directly edit the [YAML-based Tracetest definition file](https://kubeshop.github.io/tracetest/test-definition-file/).

We already had a defined tests_lists file:

We needed to add a new check. So the file now [looked like this](https://github.com/kubeshop/tracetest/commit/79f965644d52ea848125deeccb234bdfe62bd4d8#diff-c138ef478f9f6128813cd6d2feb4042212b589221dde7bf862de690ddb7adbc7):

### Step 2: Run Test

Then, we could run the test and see it failing:

`tracetest test run --definition definition/test_list.yml --wait-for-result | jq -c '.testRun.result.results[1].results | jq`

We got our "red" (failing test). Now we could fix the issue. 

I won’t go over that code in this post (you can see the [resulting commit here](https://github.com/kubeshop/tracetest/commit/79f965644d52ea848125deeccb234bdfe62bd4d8#diff-dca7efaa7caa2d46da82680410e97fcb8328aa7482992215c1c34292dd22b5ca)), but, essentially, I just changed the query to have an inner join instead of getting each test’s detail with its own query. After fixing the issue, we got a "green" (passing) test:

`tracetest test run --definition definition/test_list.yml --wait-for-result | jq -c '.testRun.result.results[1].results | jq`

CLI is not always easy to read, so here is how the UI looked:

All tests passed - everything was green. Issue fixed. Awesome!

## Conclusion: Test-Driven Development (TDD) Works!

As you can see, [Tracetest](http://tracetest.io) is a great tool to detect incorrect behaviors that are otherwise hard to identify or measure. Without Tracetest and OpenTelemetry tracing, it would take complex, often expensive load and performance testing, or users noticing the performance impact after months of usage to detect this kind of issue.

By using Tracetest, we found a huge performance issue even before any users - including ourselves - noticed the impact, just by looking at the trace with our useful Trace Viewer. We used Tracetest to TDD our fix, guaranteeing that we created useful tests and keeping them as a safeguard against regressions.