Importance of cross-functional skillset as a Reliability engineer

Is specialisation really for insects?

8 min readSep 9, 2020

“A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyse a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly.”
― Robert A. Heinlein

Over the last decade the evolution of Information Technology and introduction of paradigms such as DevOps & SRE and their impact on IT culture have brought about an incredible change in how Software Development Lifecycle is managed. The boundaries between development and operations have continued to fade away. Practices like No Ops / Git Ops (In my view Automated Ops) and you build it, you run it have paved the way for autonomy, promoting accountability and integrated experiences between teams. However this has also meant that the skillset needed to be able to successfully manage those services and applications require a more generalist understanding of SDLC while specialising in areas of technical interest.

In this post I will try to identify and define those technical areas that together make up the skillset that one may require to be part of a cross-functional SRE team helping to deliver services and applications.

Each discipline below is vast subjects and have specialisation(s) in and of itself. My aim here, is merely to describe the importance of these disciplines within DevOps / SRE context as part of a cross-functional team and to not take a deep dive of each discipline.

Systems Engineering

If you have come from a traditional Operations background when there used to be clear Silos (in some cases for good reasons) between Ops and Dev teams, you would know that despite best efforts on both sides, there were always times where Devs and Ops people locked horns on how the environments to deliver software and digital services must be designed, configured and maintained and should a configuration be part of the application stack or infrastructure.

Having said that developers don’t care deeply for how the environments are configured as long as their code is working (ideally exactly as how it works on their laptops). Development workstations and laptops are customised environments that generally operate within a trusted zone, based on the use-case of development friendly machines. This is to ensure that the business and software development teams can work on quick feedback loops to deliver application, services and new features.

However, planning and building environments to deliver those services to the end-users across multiple regions while keeping in mind the costs, available resources, service limitations, regulatory requirements, systems and their interaction points to users and other systems is altogether another matter. Of course, there are specialised roles for some of the areas I mentioned earlier but understanding their context is important as an engineer. If we look at a simple example of building an OS image to be used by an environment, you have to consider nuances of underlying virtualisation platforms like KVM, Xen and containers or hardware requirements of bare metal servers. It is a task that shouldn’t be taken lightly. Also being able to understand the use-case of an application or service allows system engineers to be able to build environments and systems for appropriate workloads. On top of that, understanding the hardening requirements based on the industry, CVE’s and their impacts adds another layer of complexity to the mix. All of the above cannot be underestimated and the need for systems engineers and their skills becomes a crucial part of the RE team.

Network Engineering

With the emergence of SDNs (Software Defined Networks), their adaptation, implementation and normalisation within the private and public cloud platforms, it has become ever more important for engineers to understand the networks, virtualised or otherwise, to be able to help deliver services that may span across multiple networks.

Networking in its fundamentals hasn’t changed over the last few decades. What has changed however is how the networks are implemented. Traditional networks are still important and mostly act as a backbone to the service. With the use of SDNs one single network can now act as a carrier for a mesh of virtualised networks, constantly evolving, growing and changing as the requirements of the services and their underlying networks change. A service can span across multiple virtualised networks which may still be residing on same underlying hardware. Those virtualised networks are built using software and can be destroyed as easily as they are created making space for newer networks. Understanding how networks interact, TCP / IP stack and being able to collaborate with traditional network engineers is an important skill to be part of a cross-functional team.

Software Development

I think and see in systems, more specifically software systems. I came from a traditional systems administration background and mostly learned through trial and error. Think of shell and scripts. Different things plumbed together working in a (somewhat) harmonised manner. Over the years I have had to work with individuals with software engineering backgrounds, I eventually understood and appreciated the discipline for what it is, building those things that need to be plumbed together. It was in-fact, some of my time spent around software development folks that motivated me to start writing code and adapt some of the patterns and practices used in the SE world within Ops and truly understand the paradigms like DevOps and SRE.

I mean, it made complete sense, why would I want to manage systems, OS(es) and application configurations manually. Being an automation enthusiast (virtue of being lazy at times), that is exactly what I wanted to do. Automate everything where possible. My thirst was finally quenched when I finally stumbled upon and started to practise Infrastructure as Code and Configuration as Code workflows and took part in regular sprints on Infrastructure development with cross functional teams to design, develop, deploy, automate and monitor services. Some of these projects have tens of repositories just for Infrastructure as Code and multiple others for system & application configurations and yet others for testing, monitoring tools and analysis. No more hand cranking of environments that require searches in unfinished and stale documentation or reverse engineering environments to understand their need for existence when a contractor or a permanent employee walks off the premises after his last day at the office without any handovers. Embracing and understanding the SDLC allows engineers to not only follow the tried and tested patterns of building software and systems, it also harmonises the flow of delivery between teams promoting collaboration across multiple teams.

Continuous Delivery & Monitoring

One of the main tenets of delivering robust software and web service is allowing development teams to deploy and test their work in a self-service fashion. This is achieved in form of Continuous Integration and Continuous deployment feedback loops allowing teams to iterate over ideas, features and designs quickly and in a repeatable manner.

This means building pipelines that deliver software services for different teams. Maybe those pipelines are automated or triggered on demand. The idea is being able to quickly run build, deploy and test cycle round the clock ready to for end users. Understanding the organisational needs for CI / CD and being able to implement them helps SRE’s add incredible value.

Monitoring modern web applications is no mean feat. In a large scale web service built on microservices architecture, the number of services that make up the whole can be numerous, with each service reporting errors, logs and traffic metrics. However, to be able to measure, learn and optimise an application, metrices and telemetric data must be captured in a centralised location for use by the SRE teams. This is to ensure that you are making the most out of underlying hardware resources, network bandwidths and appropriate application stack while proactively working to curb potential bottlenecks, latency and bandwidth issues on the service. Without a skillset to be able to effectively monitor and analyse data collected through the monitoring tools, teams do not have enough facts to make informed decisions when developing services.

Cyber Security

Although Cybersecurity is a pretty broad topic, I will keep my focus specifically on Data security. With rising data breaches, being able to identify the potential data security and ex-filteration points within a system is one of the most critical skills. Sure, a security breach that steals a company’s Intellectual property is bad for business but data breaches are also destructive to the end-users wreaking havoc for all in their wake.

Modern applications require a lot of different services to integrate together to deliver a working whole. Security, after the fact is often a bad idea, the damage (sometimes irreversible) is already done in form of reputation destruction and information of customers exposed to criminals. Cyber attacks come in all form and not only originate outside the organisations but from within as well. Disgruntled employees, corporate spies and hacktivists can cause damage once inside a trusted zone. For this reason, embedding security as an important factor of software and system design and delivery is crucial.

Organisations have teams of cyber security professionals and being able to collaborate and understand the organisational needs for data security, industry regulations, frameworks and standards like HIPAA, GDPR and ISO-27001 help a cross-functional team identify and help mitigate any data and IP security risks by planning and designing systems that can or integrate with third-party systems to proactively thwart future attacks.

Customer focus and delivery mindset

The main premise of one of my other posts Empathy is the way to AutomatedOps is that as an engineer, you need to make sure that you have empathy with your customers and have a human centric approach to solve problems. DevOps paradigm is based on the constantly evolving seamless integration of people, processes and technology working together to deliver business goals, increase efficiency and minimise costs. SRE principles go one step further which is harder to achieve unless you can relate to the customers you are working with, internally and externally.

For example, working with developers, being able to understand their day-to-day work patterns, technology stack they work with, technical challenges around code delivery from pipelines, testing and QA all the way to a production release i.e. In essence understanding the SDLC framework adapted by the team or the organisation.

In case of working with Product owners and scrum masters, being part of the ceremonies, understanding the needs of the sprint, communication on commitment, estimation and delivery of stories as well as finding priorities based on major milestones is an important skill.

Conclusion

I hope through this post you have gained an insight on how different skillsets make up the paradigms of SRE and DevOps engineering principles, a diverse, challenging yet valuable endeavour and the impacts it has on the delivery of software projects and the modern world.