How to make software supply chains resilient to cyber attacks

This article first appeared in VentureBeat

Imagine if someone asked you to drink a glass of liquid without telling you what was inside or what the ingredients might do. Would you drink it? Maybe, if it was given to you by someone you trusted, but what if that person said they couldn’t be sure what was inside? You probably wouldn’t partake.

Consuming the unknown is exactly what IT departments do every day. They install software and updates on critical systems without knowing what’s inside or what it does. They trust their suppliers, but the thing that software suppliers don’t tell IT departments is they can’t be sure of all their upstream suppliers. Protecting all of the parts of a software supply chain, including those outside of IT’s control, is nearly impossible. Unfortunately, bad actors are taking full advantage of this large “attack surface” and scoring big wins in cyber breaches.

A big problem getting bigger

The most famous example was the hack of Austin, Texas-based business software developer SolarWinds in 2020. Attackers inserted malicious code into software that was widely used by industry and the federal government. IT departments installed an update containing the malware and large volumes of sensitive and classified data were stolen.

Other software supply chain attacks have happened at companies like Kaseya, an IT Management software company where hackers added code to install ransomware, and Codecov, a tool provider whose software was used to steal data. And compromised versions of “coa” and “rc” open-source packages have been used to steal passwords. These names may not be familiar outside of IT, but they have large user bases to exploit. Coa and rc have tens of millions of downloads.

Quite obviously, attackers have figured out it’s far easier to hack software that people willingly install on thousands of systems than to hack each system individually. Software supply chain attacks increased by 300% from 2020 to 2021, according to an Argon Security report. This problem isn’t going away.

How could this happen?

There are two ways hackers attack software supply chains: They compromise software build tools or they compromise third-party components

A lot of focus has been placed on securing the source code repositories of build tools. Google’s proposed SLSA (Supply Chain Levels for Software Artifacts) framework allows organizations to benchmark how well they have “locked down” these systems. That’s important because there are now hundreds of commonly used build tools — many of which are easily accessible in the cloud. Just this month, open-source plugin Argo CD was found to have a significant vulnerability, allowing access to the secrets that unlock build and release systems. Argo CD is used by thousands of organizations and has been downloaded over a half a million times.

At SolarWinds, attackers were able to access where source code was stored, and they added extra code that was ultimately used to steal data from SolarWinds users. SolarWinds built its software without realizing that malware was being included. This was like giving an untrusted person access to the ingredients in that glass of liquid.

Even if companies control their own build environments, the use of third-party components creates massive blind spots in software. Gone are the days when companies wrote a complete software package from scratch. Modern software is assembled from components built by others. Some of those third parties use components from fourth and fifth parties. All it takes is for one sub-sub-subcomponent to include malware and the final package now includes that malware. 

Examples of compromised components are staggeringly common, especially in the open-source world. “Namespace confusion attacks” are cases where someone uploads a package and simply claims it to be a newer version of something legitimate. Alternatively, hackers submit malicious code to be added to legitimate packages, since open source allows anyone to contribute updates. When a developer adds a compromised component to their code, they inherit all current and future vulnerabilities.

The solution: A permissions framework

Industry groups and government agencies like the Commerce Department’s National Telecommunications and Information Administration (NTIA) are working on developing a standard and plan to use an executive order to mandate the use of a software bill of materials (SBoM) for government-purchased software. An SBoM is a software ingredients list that helps identify what all of the components are but unfortunately won’t indicate if they were hacked and will misbehave. Hackers won’t list their code in the ingredients.

Developers can improve the security of the build tools they control and list third-party ingredients from their suppliers, but that won’t be enough for them or their users to be sure that none of the ingredients were compromised. IT needs more than an ingredients list. It needs software developers to describe how code and components are expected to behave. IT teams can check those declarations and ensure they are consistent with the software’s purpose. If a program is supposed to be a calculator, for example, it shouldn’t include a behavior that says it will send data to China. Calculators don’t need to do that.

Of course, the compromised calculator might not say that it intends to send data overseas because hackers won’t publicize that software was compromised. A second step is necessary. When the software runs, it should be blocked from doing things it didn’t declare. If the software didn’t say it intended to send data to a foreign country, it wouldn’t be allowed to.

That sounds complicated, but examples already exist with mobile phone apps. When installed, apps ask for permission to access your camera, contacts, or microphone. Any unrequested access is blocked. We need a framework to apply the concept of mobile app-like permissions to data center software. And that’s what companies like mine and many others in our industry are working on. Here are two of the challenges.

One, if a human approves “sending data outside of my company,” do they mean all data? To anywhere? Listing all types of data and all destinations is too much detail to review, so this becomes a linguistic and taxonomy challenge as much as a technical one. How do we describe risky behaviors in a high-level way that makes sense to a human without losing important distinctions or the specific details that a computer needs?

Two, developers won’t use tools that slow them down. That’s a fact. Accordingly, much of the work in declaring how software is expected to behave can — and should — be automated. That means scanning code to discover the behaviors it contains to present findings to developers for review. Then, of course, the next challenge for everyone involved is to determine how accurate that scanning and assessment is.

These challenges are not insurmountable. It’s in everyone’s best interests to develop a permissions framework for data center software. Only then will we know it’s safe to take that drink.