How a Flawed CI/CD will Blow you off

While many engineering managers are aware of the benefits of CI and CD, few are prepared to define the processes to ensure smooth work flow.

Ramalingam S

Published:2nd Mar, 2018 at 3:39 PM

Updated:13th Mar, 2018 at 9:46 AM

We are in the era of Agile deliveries. Many engineering managers are aware of the benefits which Continuous Delivery (CD) and Continuous Integration (CI) can bring to any organisation. However not many talk about the day-to-day obstacles where if the process of CI/CD not followed meticulously, can our out to be major roadblocks.

QA and DevOps

If you are a Software QA and not sure about the connection yet , please read on

Quality is not only QA’s problem any more
Build promotions, CI and CD are not only DevOps problem any more
More and more automation at Devops front
Success of DevOps relies on a well disciplined QA practice
DevOps is not killing QA

What could go wrong ?

Lot of agile teams today possess a common pattern of working model

Multiple streams of work
DoD (Definition of Done) enforces automation (DevOps, Testing, Processes etc)
Continuous integration
Automated Functional tests
Continuous Deployment

Every single item from the above list is supposed to add value at multiple levels and felicitate quick and stable product delivery.

However release engineering and production deployment path becomes the most cumbersome travel in many places. There could be so many symptoms that indicates CI/CD flaws.

Deployments are flaky/unreliable over time
Often deployments exceeds allowed down-times
Production releases becomes no lesser than a gambling.. (Yes - You heard it right)
Lot of Quality issues/defect slippages

The Success of the release engineering and related practices highly depends on the nature of the team (Including Devs, QAs, DevOps). It depends on too many factors and it is hard to propose a single ready made solution to set them right.

I was trying to compile a list of issues that we have faced and possible reasons for the same in Agile delivery team. How about deriving a list of symptoms and corresponding possible reasons for the same.

Not sure the table can be used as a thumb rule but definitely this can be used a checklist for optimizing Release engineering practices.

Area	Symptoms	Possible reasons
End to End Pipeline Setup	Promoting builds takes more time? Deployment failures Build monitor being red always Who broke the build? Blame game?	Semi automated pipeline ( manual selection of builds, data migrations etc) Flaky IaC modules Unstable connections (Data , network)
Deployment Frequency	Significant amount of time in a sprint spent in deployments Issues around Merging, Cherry picking etc Issues around Verification Business/Client is not happy with the deployment frequency	The Chosen deployment frequency is not right (Too early or late) Not being a conscious decision - Business/Engineering/Clients
Branching Strategy - Trunk based development	Incomplete features in Production Frequent defect leakages Longer test cycle Lack of clarity around features to be enabled/not enabled/should not be enabled	Defects around Feature flags Discipline around clearing unused flags Feature factories Visible/Documentation around available feature flags
Branching Strategy - Multiple branches	Often Merging takes longer time Repeating the tests to ensure merge misses	Unstable components Missed merges causing issues No automated means for finding merge misses
Discipline around Test automation	I can’t wait for the tests to complete ! Decent test coverage but still lot of defects identified in production People are hesitant to sign up UI test failures Dec complete, tests are not ready :)	Long feedback tests Unrealistic data UI tests - bloated over time Catch game? (Effort estimation for Test automation is often ignored/overlooked)
Too many stories running across sprints	Adverse impacts on Test efficiency Testing efforts Complicated merges Multiple test cycles	Stories are not sliced properly.
Migrations	Deployment overshooting the planned downtime Point of no return - No reverse migrations :)	Non additive migrations No Two way compatibility (Code and Schema) No Blue green deployments

There could be many other patterns and the above list is just an indicative one. It would really help if we continuously review the release engineering practices and follow/adopt the most relevant ones.