Giving away Ransomware IP
/I will be the first to say that what follows is an unusual blog. A year ago, CTM filed a provisional patent on a new way to detect and mitigate ransomware. Today, I am not only letting the application expire, I’m publishing the provisional document so that neither I nor anyone else can patent the concepts.
Why? Because this needs to exist, and the best way to enable that is to make the idea freely available. To anyone. It’s called a defensive publication; by placing the idea in the public domain, nobody can claim it. Including me.
A provisional patent simply stakes a claim at a point in time. It doesn’t necessarily fully flesh out the concept. It doesn’t get reviewed. It’s a way to say that you are claiming an idea, and gives the author a year to work out details and apply for a full patent. Today is 1 year.
The problem I noticed with ransomware was that most people focus on preventative controls like patching and firewalls to stop lateral movement before an attacker got a foothold. They also (correctly) focus on controls like backups to help recover after an incident. What was missing was a way to limit the damage during an active attack. Kind of a preventative control in the moment.
Ransomware often goes through 3 distinct phases. In phase 1, an initial system is compromised— often because a user clicked on a malicious link or file, or because of an unpatched vulnerability in an internet facing system. In phase 2, the ransomware attempts to quietly spread to as many other machines as possible. It moves stealthily so as to preposition itself on as many systems as possible before it activates. In this phase, it may exploit very different vulnerabilities than the one it used in phase 1. In phase 3, all deployed instances of ransomware activate simultaneously. At that point, it’s often too late to contain (assuming it spread in phase 2) and it’s now a race to shut off infected systems while they are busy being encrypted.
Phase 3 is generally “noisy”, with the malware trying to encrypt as many files as possible before being stopped. Modern ransomware prioritizes the files most likely to be valuable, so even if interrupted it’s likely too late. The number of systems plus the sheer speed of data loss overwhelms the ability to stop it. Networks are unplugged to limit any further phase 2 spread, and infected machines shut down, but generally too late.
In its need to go quickly lies an opportunity. Ransomware should be detectable by it’s noisy behavior. We can certainly monitor for large amounts of files being open, encrypted, and written back to the disk. We can look for changes in entropy as well. Why not use that as a trigger to limit the damage? Most people say the issue is false alarms; automatically shutting the machine down just because it’s behaving unusually risks real damage if we guess wrong. Not worth the risk of stopping something like a server that isn’t infected.
What if we don’t stop it? Our first big idea was to slow writes to the disk when we think ransomware might be active in stage 3. We can implement a delay in the storage device driver. We never stop the system, and reads are unaffected, but incrementally slowing writes while notifying an administrator limits the damage and gives an administrator time to check things out. If we trigger in error, the machine runs slowly for a little bit, but it keeps running.
Implementing this in the storage device driver has several other benefits. It’s a natural place to watch for read and write behavior system-wide, so it doesnt’t rely on individual process monitoring (which is easily defeated by running a larger number of malware processes, none of which individually trip a threshold). It’s also hard for malware to disable the storage driver without rebooting the machine, which we would notice.
The second big idea was to use the order of operations to help fine tune ransomware detection. At the file level, the behavior is deterministic in that an encrypted file can’t be written or deleted before an unencrypted version is read. Leveraging this understanding further reduces the probability of a false trigger.
So that’s it. What follows is the original provisional that was filed. The form may be something only a patent examiner would love, but I’m replicating it here without change to ensure zero ambiguity..
Anyone who wants to implement these ideas can do so knowing that I have explicitly relinquished any claims to them. Improving our ability to stop ransomware is far more important than my having another patent in my portfolio. If you are a developer, feel free to take it from here.
/lou
System and Methods for Establishing a Sequence-Based Approach to Ransomware Detection
Background
As noted in US Patent Application Number 20190228153 by Scaife, et. al. (Scaife), it is desirable to detect ransomware by monitoring its behavior. All ransomware must by its nature to read files, process them through means like encryption, and write the results. The results must either overwrite the original file or delete it to deny the legitimate owner access to unencrypted data. This fundamental behavior of ransomware, necessary to achieve its objectives of denying legitimate access to files, can be adjusted but not materially changed which creates an opportunity for real time detection and blocking before substantial damage is done.
As the use of ransomware has increased in popularity by bad actors, the need for such defenses is apparent.
The approach referenced in the Scaife Application has several shortcomings, which permit ransomware opportunities to avoid detection. The existing approach taught and disclosed focuses on the behavior of each process running on a computer. Scaife defines a process as an instance of computer code. Those creating ransomware software can seek to avoid detection by simply creating many small processes, none of which behave in a manner to sufficiently trigger detection by exceeding a preset threshold or malware score. In the aggregate, however, they collectively achieve their purpose.
Additionally, the existing approach scores activity attributes (such as reading files, increasing system entropy, and writing files) without regard to their sequence. As a ransomware file cannot be written before it is modified (through encryption or other means), and cannot be encrypted before it is read, a better method of detection with reduced false alarms would be to score the sequence of events instead of or addition to their discreet activities.
Malware creators, including those creating ransomware, have demonstrated that they will evolve their software to evade detection. It is therefore reasonable to assume that process-based attribute scoring will be rendered less effective by simple changes to ransomware. This creates a need for an improved solution.
Further, US Patent application 20180157834 of Continella et al (Continella)combines detection with transparent file recovery by replacing potentially compromised files. Unfortunately, this may result in additional storage requirements for "shadow copies" and/or legitimate changes being replaced and lost. In doing so, Continella may increase costs and/or harm the correct operation of the system. A method which reduces potential harm but does not create new issues or increasing storage costs by copying and/or replacing legitimately modified files is therefore needed.
Finally, it is always desirable to reduce "false alarms" in any detection system. False alarms distract users from other activities and can, when followed by an action to remediate, create new issues. Therefore, methods to better score and detect legitimate issues while reducing false alarms are needed.
Detailed Description
As noted above, the Scaife approach includes for detecting an instance of a malware process by scoring the behavior of said process using a combination of attributes. An easy way to avoid detection would be to create many small processes that each, individually, limit their activity to a level that does not meet the threshold of a test. The present disclosure proposes a method of looking at the combined behavior of all processes simultaneously. Using this method, an infected system will still be detected even when many small malware processes are run. This might, for example, be done by inserting software at the storage "device driver" layer that observes all read and write activity regardless of process.
It is also important to note that reacting to a detected event is critical to minimize harm. The Scaife approach envisions simply stopping ("dropping") a suspect process. Such an action can cause issues with legitimate processes which are not permitted to run and can be even more harmful if full system activity is being stopped; many operating systems will fail if they are suddenly stopped from accessing storage. Instead, the present disclosure implements a system that slows all system access to its storage, whether directly attached or via a network. Slowing access, particularly write access, allows the system to continue to operate (at an intentionally degraded level). This minimizes the rate of damage and allows a legitimate administrator time to be notified of the detected issue and to intervene if necessary. In one embodiment, the added performance delays may be increased if no action is taken to affirm that the system is running as expected or if additional and repeated behavior is observed. This effectively slows the performance more, in preferred embodiments when writing, should an administrator not intervene.
Additionally, while the approaches in Scaife and Continella may include using a collection of behavioral attributes individually or in combination to detect ransomware, neither leverages the sequence in which those attributes are invoked as a part of developing a behavioral score to test. For example, a system that writes a number of files and then later reads them is unlikely to be ransomware, since ransomware read operations necessarily predate encrypting, which predates writing. Bulk operations and caching may appear to skew this to some extent, but the fundamental order of operations is dictated by the malicious behavior of the ransomware. As such, the sequence, or order, of operations may be used to score the likelihood of ransomware activity and reduce false alarms. As a result, the present disclosure provides a distinct difference from previous approaches and solutions to ransomware detection and mitigation by proposing to look at system behavior (e.g., a collection of processes) versus individual processes, potentially scoring behavior based on the sequence of operations, and slowing versus stopping the system behavior on suspect systems as well as mitigating or eliminating the need for a transparent file recovery and file replacement mechanism.
The following describes one or more of the elements that may be included in any embodiment based on the present disclosure. The embodiments may include, but are not limited to, computer systems, network devices, mobile and wireless devices, including phones, tablets, computers and the like.
1) Detection of ransomware based on the behavior of a computer system over a period of time, whether that behavior is the result of a single process or a collection of processes.
2) Monitoring the volumes of individual behaviors at the system level, with some or all of the behaviors being scored and thresholded. Examples of behaviors associated with ransomware that increase the score to be tested include large number of: file reads and writes, file deletion or renaming, high CPU, changes in file type or magic number or file entropy.
3) In some embodiments, the score may be further increased or affected by the sequence in which some or all of the individual behaviors is detected.
4) Incrementally slowing a system once ransomware is suspected, to minimize harm until the system can be checked.
5) In some embodiments, access to file writes may specifically be incrementally slowed.