Balancing Tech and Product, Building Tech Strategy and OKRs
Who isn’t familiar with the problem of reducing tech debt while having ambitious product goals? It happens in every company. In the past year we’ve tackled this at Kry using different methodologies that helped our engineers to continuously get rid of the tech debt and create very little new debt. Below you can read how two of my teams handled it, and what we did on a department level.
The Solution of Team #1
The engineers had a workshop where they created tickets in Miro with the existing tech debt issues. They went through them, made sure everyone understand all of them, estimated them and voted on priority. Then each sprint (one week sprint), they pulled 1–2 tech improvement tasks (about 1-day of work per task per person, for a team of 4 engineers). Every 4 months they had tech week where they all worked on tasks that take longer.
The Solution of Team #2
Team #2 had the same process for collaboratively collecting, estimating and prioritizing tech improvement tasks, but they chose to have one “hack day” every two-weeks sprint and mob through one of the tasks then.
Both teams had one engineer from the team who led this process and made sure to redo the exercise on a quarterly basis.
Promoting tech priority through a department tech strategy / OKRs
Each quarter we defined department-level tech OKRs, but we’ve found it hard to accomplish them, due to lack of time, as well as the need to define new ones quite soon after defining the previous ones. We then moved to yearly strategy, inspired by the book Good Strategy / Bad Strategy and led by our Staff engineer, Saul Edwards, and me.
We started in a workshop led by Saul for our EMs and Staff engineers. Each person wrote down all the known problems / areas we should improve (real challenges with speed, cost, quality, employee morale, onboarding, user experience, etc), a diagnosis (a one-sentence summary of the main challenge), guiding policy (a one-sentence summary of our long-term direction to overcome the challenges), actions (a coherent plan of action that implements the guiding policy) and OKRs (measurable, time-bound results toward one or more actions).
We clustered the problems and diagnoses, and created agreed-upon policies and action items. We then took it to the teams for feedback, and then Saul created a document that summarises four strategy pillars, that we could share with our department.
Every quarter we would go through it with each team, remind them of what we’re trying to achieve, check what we’ve achieved so far and plan for what we want to achieve next, according to this strategy. The teams reviewed their boards of tech debt and changed the priorities in light of this strategy.
Example of a pillar from our tech strategy
One of our objectives was to have user-focused SLOs. SLOs are Service Level Objectives — monitoring that alarms if the level of service we give to our users decreases. User-focused means that it’s not about the availability of a certain server (although these are important to monitor as well), but a way to have insights on a user flow, without relying on our customer support / user feedback. For example: a flow contains going through 20 pages. Maybe each one of them takes half a second to load, which is acceptable, but for the whole flow the accumulating time is too much. To create the SLOs we worked with our PMs to understand what it means to have a good level of service per product, and experimented with the data to notice anomalies. The data analysis work led us to find interesting and meaningful SLOs.
Breaking down a monolith
The last story that I want to share on this topic, is how we managed to break a key component of a complex monolith out to a new microservice — work that took us half a year, and helped us find out critical bugs that we weren’t able to notice before, when the massive amount of logs hid important clues on what’s happening.
We started by creating RFC and requesting feedback on our plan. We then started with a few sprints of part of the team working on this change. But soon enough we saw that product priorities do not allow us to continue the same way — those people were needed to fix bugs and add new functionality that couldn’t wait a few months.
Context switch horror
The fear of many engineers, including myself when I was working as an engineer, is from too many context switches. They prevent you from getting into the zone and focusing on a problem until you solve it, or on a feature until you deliver it. They increase the chances to add bugs and make mistakes, because you forget the context or important tiny details due to switching to another domain of problems. Velocity and quality decrease the more context switches happen.
A potential solution
But sometimes there’s no other way. The way we handled it (led by our talented senior engineer Kostas Pachatouridis), is to break the work into parts that make sense with context switches. We then worked on a slice of work, got back to delivering business value, and then worked on the next slice.
As mentioned above, we managed to finish this complex work and bring business value with it.
Unless the team opens a war room, tech debt reduction and investment in tech strategy should be part of the routines of the teams. This leads to business value. There are different ways to smoothly combine it with product work.