Once upon a time, the Internet was a friendly, happy, flourishing place populated by academics and scientists. Like all idylls, this attracted predators and parasites, and soon the Internet was overrun by the confederates of our appetite-satisfying service economy: commercial advertisers, pornographers, and money. And this attracted criminals, which in turn attracted venture capitalists, who lent money to the displaced academics and scientists (and some mediocre criminals) to create a new industry: network security. Below is a quick recap of the past 25 years of the network traffic inspection subset of that industry, and its major (and interim) stages of development, but before we dive in, a quick riddle: What do ants, cars, and neurons have in common? If you answered, “they all have some form of antenna,” you score points for good abstract association, but it’s not the answer we’re looking for. We’ll come back to this.
1 – The Packet Age
Packet Filters – Started in the late 1980’s as an outgrowth of access-control lists (ACLs) on routers. Inspection and policy focused entirely on L2-L4, and treated each packet discretely, lacking any notion of a connection or a flow.
2 – The Flow Age
Stateful/Flow Inspection – Stateful firewalls and flow tracking technologies (such as Netflow) were introduced in the early to mid 1990’s. This advanced the inspection capabilities of packet filters by adding state-tables or connection-caches for tracking flows, moving the primary unit of inspection from the packet to the L4 flow. The result was improved throughput and defense against an emerging class of TCP-level attacks that exploited the naïve header-focus of packet filters.
3 – The Application Age
Proxies and Deep Packet Inspection (DPI) – In the early 2000’s, attackers, thwarted by stateful firewalls, moved to the next layer in the network stack, the application layer. In response, network security vendors developed two approaches to application layer inspection, the proxy and DPI. Proxies were an early effort to inspect what was happening within common applications such as HTTP, FTP, SMTP, POP3, and DNS. They worked by brokering the client/server connections with their own “secure” version of the application, so an HTTP connection when from “Client<->Server” to “Client<->[Proxy(Server)—Proxy(Client)]<->Server”. This approach worked acceptably well for a while, but then network speeds jumped from megabits to gigabits, and network applications grew from the tens to the thousands.
To solve the limitations of early proxies came Deep Packet Inspection. DPI is a stream (or flow) based scanning technology with the ability to statefully inspect the content of every byte in every packet in every flow. Its reassembly and transformation engines allow it to process fragments, out-of-order packets, and all common encoding/presentation formats so that it can always reliably match patterns (or signatures) within flows.
With DPI-powered next-generation firewalls (NGFW) and a new breed of application proxies (in the form of Web-Application and Database Firewalls) fairly effectively blocking known application attacks and exploits, attackers and data thieves again had to change their tactics. And so began the epidemic of polymorphic multi-platform malware (PMPM).
Whereas the incidence of novel malware and exploits capable of evading detection by signature-based detection engines such as DPI or application proxies used to be rare, requiring the manual effort of one highly skilled in the art, it is now trivially simple thanks to automation. Exploit kits, packers, and obfuscators allow for anyone with little more than malicious intent and a few hundred dollars to create and distribute PMPM virtually undetectable by signature or norms-based inspection or proxy engines. As the chart above indicates, 2011 saw about 17.5 million new pieces of malware (AVTest), but 286 million unique variants (Source: Symantec ISTR 2011). One might, if one were so inclined, interpret this is as “16 detected variations of each strain”. 2011 saw 18 million new pieces with 403 million variants, or 22 variations of each strain, for a 37.5% increase in variation over 2010. Following that trend, and considering that we are at 12.5 million new pieces of malware at the start of July, we might expect to see 25 million new pieces in 2012, with 30 variations (flat 37.5% increase on 22) for a projected total of 750 million variants. And these numbers barely account for alternative/mobile computing platforms such as smartphone, tablets, and Internet-enabled consumer electronics which—unlike PCs that have been under attack for decades—have somewhere between “alarmingly immature” and “totally non-existent” host-based security options.
So what do ants, cars, and neurons have in common? They are all often-cited examples of emergence. The concept of emergence is broadly multidisciplinary, but in virtually all forms it states, at it simplest: the whole is greater than the sum of its parts or more specifically that the whole is a novel entity that emerges irreducibly from assemblages of its component parts. Ants form colonies with complex social organizations that cannot be understood simply by studying the seemingly programmatic behavior of ants. Cars form traffic whose dynamics remain frustratingly unintuitive (particularly on the Beltway in D.C.) despite the intuitive “follow-the-leader” model that most individual drivers employ. Neurons collect to form brains from which emerges consciousness, yet despite our comprehensive microscopic-level understanding of neuroscience, we still often find each other’s behavior puzzlingly mercurial or irrational. In other words, macro-level behaviors of complex systems cannot be predicted through micro-level analysis.
Files, created by adaptive authors, and processed by applications or run by operating systems constitute such complex systems. Trying to understand their emergent qualities and effects such as process, file-system, network, and API activity is impossible through even the deepest of packet or flow inspection. Understanding what emerges from collections of packets and flows can only happen through inspection at the macro-level: the interaction between the files they convey and their diverse operating environments.
Recognizing this, a small but growing number of security vendors such as FireEye, Norman, and GFI have recently started offering virtualization and sandboxing platforms capable of “detonating” or dynamically analyzing executables and other files typically associated with PMPM. Their approach to date has the following characteristics:
- Either a stand-alone analyzer that must have files delivered to it, or a mediating gateway model with:
- Limited transport application support, typically only HTTP and SMTP, and often without DPI classification for non-standard ports of operation.
- <1Gbps throughput, and with no simple model for deployment scalability.
- A focus on Windows malware (in particular Win32) with limited or no ability to analyze files for or on other platforms, e.g. an Android .apk file, or a Java archive or PDF run on a Mac.
- Detonation is a very computationally expensive operation, yet it is largely undiscriminating:
- It is commonly employed such that all payloads are detonated and evaluated irrespective of their likeliness (i.e. reputation, point of origin, signature characteristics, etc.) of being malicious.
- A payload that is determined to be malicious by a Windows sandbox will likely generate false-positives when downloaded by a non-vulnerable, non-Windows client. The inverse (attack against non-vulnerable sandbox, vulnerable client) will result in a false-negative.
- A payload that can successfully exploit an older version of an application (e.g. Java) will likely generate false-positives when downloaded by a non-vulnerable client that has updated the application. The inverse (newer version on sandbox, older version on client) will result in a false-negative.
While these detonation platforms offer clear advantages over signature or norms-based security devices, they are too limited in their scalability, and in their support for the necessary diversity of file-formats, operating-systems, and transport/application protocols to effectively defend against the full scope of PMPM. They have advanced beyond the application age, but haven’t, for all their shortcomings, fully made it to the next stage. History might remember them as the missing link. The time has come for the fourth stage.
4 – The File Age
Having briefly recounted the industry’s evolution from packets, to flows, to applications, to an intermediate stage of limited detonation, the trajectory should become clear: in order to more competently deal with the PMPM threat, we need a platform that can efficiently and adaptably deal with the current and future diversity of protocols, platforms, and file-formats.
RTE (Real-Time Extractor) is a system within the Solera DeepSee Threat Profiler platform that is designed to provide generalized file-level analysis across a broad and extensible set of protocols, platforms, and file-formats. It effectively does for files what DPI does for packets and flows. RTE builds on the DeepSee Extractor platform, advancing Extractor’s capability to reconstruct packets and flows into files from an operator-driven, on-demand process to a fully programmable part of an automated analysis workflow. To accomplish this, RTE encompasses the following components:
- Detection – The rule identifying files within flows that warrant further analysis. Detection policies (Rules) can incorporate any Filters or Favorites that can be used in the DeepSee path bar, including network attributes, application context, application metadata, user context, geo-IP information, or any combination thereof. It’s simple, for example, to set a detection policy that triggers extraction on “all executables, PDFs, and Java Archives coming from servers other than ‘Trusted Servers’ delivered over HTTP or SMTP” and to submit them for analysis via a configurable Action
- Extraction – The process of extracting the target file from a network flow across a set of common network applications and transport protocols, and across a set of common file types.
The result today is a platform (above) that projects network analysis into the file age while simultaneously providing some of the industry’s most robust packet, flow, and application analyzers, arming security professionals with a highly competent network visibility, analysis, and reporting platform. In the near future the RTE platform will be the engine that powers automated workflows that will:
- Dynamically discover novel malware transported on any known protocol, and targeting any replicable operating environment
- Update file, IP, URL, and domain reputation databases
- Assess relatedness to past and future file activity to more efficiently discover polymorphism
- Inform control points on networks for automatic rule updates and enforcement
- Fingerprint discoveries to create defensive signatures for deployment to a site, an enterprise, or an entire subscriber-base
* – with apologies to the Tyrell Corporation and White Zombie