Intercept X 2.0 is about to be released. The beta phase has been running for a while now and the product is expected to arrive this month (January 2018) as a free update. This article is not yet about Intercept X 2.0, but more about a deeper look into the technology it contains.
Machine Learning is currently the talk of the town, just as the Cloud, VR (Virtual Reality) and AR (Augmented Reality) were before. Marketing then often sells it as A.I. for Artificial Intelligence or KI, which stands for artificial intelligence. Whether it’s chat bots, virtual assistants, autonomous driving cars, translation tools, smartphones, or photo software, manufacturers say AI is in everything, making our lives easier and the product smarter.
AI chips (neural processing units (NPUs)) for pattern recognition and analysis are currently installed in the top smartphones. If you imagine where it’s going, soon everyone will have a little supercomputer in their pocket that’s almost always online. Richard Hendriks’ decentralized Internet from the TV series “Silicon Valley” will soon become reality. Such projects already exist on the blockchain for decentralized storage or decentralized computing power.
At this point, a very clear recommendation for the TV series Silicon Valley! 😉
But back to the topic. Everyone is currently advertising with AI, but not everywhere that says AI on it is actually AI in it. The differences are huge. The term AI actually refers to nothing more than machine learning, which is also what Sophos relies on with Intercept X 2.0.
What exactly is this “machine learning”?
Machine learning is briefly and aptly described as follows on Wikipedia:
“Machine learning is an umbrella term for the “artificial” generation of knowledge from experience: an artificial system learns from examples and can generalize them after the learning phase is over. This means that it does not simply learn the examples by heart, but “recognizes” patterns and regularities in the learning data. Thus, the system can also assess unknown data (learning transfer) or fail to learn unknown data (overfitting).”
More precisely, Sophos relies on Deep Learning, an advanced form of machine learning.
Well I think you can roughly imagine from the above explanation of Wikipedia what great impact machine learning can have on a product like Intercept X. We should all know that since 2005, signature-based virus scanners are no longer crucial for detecting viruses, as only malware that is already known and detected can be fought. So it’s always a battle between malware programmers and signature writers. Logically, the malware programmers always have a short head start, which means that the malware is also always unknown for a certain time. As soon as the new malware is known, however, even slight adjustments to the program are enough to make it “unknown” to virus scanners again.
Sophos has already developed a great many alternative methods for detecting malware and has long since stopped relying on signatures. But still, it was not a bad move to acquire Invincea at the beginning of 2017 and thus bring a technology to the products that knows how to protect against future threats and thus unknown pests as well.
Machine learning is not completely new either, however. The algorithms have been around since the 80s and not much has changed. But until now, you didn’t have Big Data and so the processing power to do that. Therefore, Machine Learning experienced its revival around 2012. The same goes for Genetic Algorithms, which I believe malware writers will make use of in the future.
How does machine learning work in theory?
Simply put, you feed the machine with a lot of data. The algorithm takes them apart and analyzes the characteristics of the files. This can be e.g. the file size, but also more complex features like whole components of the code. So after this process, you don’t just have a hash value, as with signature-based detection, but a great many clues. Thus, a small adjustment of the code is no longer enough to disguise itself as completely new malware, as other features would remain the same.
Once you have the characteristics, you start working on so-called “models”. A lot of data is needed for this. It comes in handy that more than 390,000 new malware programs appear every day, i.e. more than 16,000 per hour. Sophos Sandstorm or Intercept X, which transfer data to Sophos Labs, also help collect data and train the model. Malicious URLs or spams also provide learning material. It takes not only malware, but also good files so that false positives are not detected later.
You test several different models at the same time and take the one that gives the best results. The model and properties create a pattern of what malware should look like and how it differs from a good file. These patterns then allow files to be assessed and the probability of whether they are malware to be calculated. This all happens within milliseconds and requires massively less power (CPU and RAM) than other analysis methods. During an update, only the pattern recognitions are improved and new signatures are not loaded every x-seconds as is the case with signature-based recognition.
If you’d like to delve a little deeper, check out Sophos’s technical article on the subject: Sophos Machine Learning how to build a better threat detection Model
The PDF is in English, but can be translated into many other languages using Machine Learning from DeepL: https://www.deepl.com/translator The well-known Google Translator also uses Machine Learning, of course, but DeepL has been fed with better data and the machine has been trained noticeably better.
Machine learning alone is not enough
Machine learning can already achieve incredibly good detection rates, and the advantages over signature-based detection are obvious. However, Sophos does not rely on these new patterns alone, but uses machine learning as just another technology to achieve the most exception-free malware detection possible.
Intercept X 2.0 will thus be able to help even more with ransomware and exploit detection thanks to machine learning, complementing other technologies such as Exploit Prevention, Malicious Traffic Detection, CryptoGuard and the Synchronized Security Heartbeat. It is precisely these additional technologies that separate the wheat from the chaff, or in other words, the standard antivirus from a professional solution.
Is Intercept X sufficient as the only protection?
One might now ask whether the normal antivirus has not had its day, when you have installed Intercept X with these great technologies and now in the future also with Machine Learning. If you are using the Sophos Endpoint Client, it should definitely continue to run in parallel with Intercept X. The reason is that Sophos Endpoint Client is much more than just a normal antivirus that detects malware based on signatures. For example, the Sophos Endpoint Client can do web security, web control/category-based URL filtering, device control, or application control, to name a few. A complete overview of the differences between Sophos Endpoint Protection and Intercept X can be found in this datasheet.
For all other, “classic” antivirus programs, I indeed see no use for the future. At the moment, however, there is nothing to stop the antivirus from running in parallel with Intercept X.
More about the topic Machine Learning
- Defining the truth: how Sophos overcomes uncertain labels in machine learning
- Man vs machine: comparing artificial and biological neural networks.
- 5 questions to ask about machine learning
- Demystifying deep learning: how Sophos builds machine learning models
On the Sophos Labs site , there are recently great real-time statistics on daily spam and malware activity generated from a lot of data.
For those who like such real-time data, we have gathered a few links. We ourselves always find it impressive to see how many attacks really happen out there. It’s crazy what goes on behind the scenes:
Norse Attack Map
Here, cyberattacks are displayed in real time. For this purpose, 8 million sensors and more than 6000 applications are on servers in 40 countries, so-called honeypots, which are virtual traps. This all adds up to over 7 petabytes of collected attack data.
Norse maintains the world’s largest dedicated threat intelligence network. With over eight million sensors that emulate over six thousand applications – from Apple laptops, to ATM machines, to critical infrastructure systems, to closed-circuit TV cameras – the Norse Intelligence Network gathers data on who the attackers are and what they’re after. Norse delivers that data through the Norse Appliance, which pre-emptively blocks attacks and improves your overall security ROI, and the Norse Intelligence Service, which provides professional continuous threat monitoring for large networks.
FireEye – Cyber Threat Map
The FireEye Cyber Threat Map shows a daily summary of all global DDoS attacks.
- Top 5 reported industries
- Top attackers by country
- FireEye Cyber Threat Map
Kaspersky – Cyber Map
Cyberthreat real-time map from Kaspersky shows real-time attacks detected by their various source systems.
- On-scanner access
- On-demand scanner
- Web Antivirus
- Mail Antivirus
- Intrusion alarm system
- Vulnerability scan
- Kaspersky Anti-Spam
- Botnet activity detection
- Kaspersky Cyberthreat real-time map
Akamai – Real-Time Web Monitor
Akamai monitors global Internet conditions around the clock. Using this real-time data, they identify the global regions where the greatest web attack traffic occurs, cities with the slowest web connections (latency), and geographic areas with the highest web traffic (traffic density).
Checkpoint – Live Cyber Attack Threat Map
Threat Cloud from Check Point also shows attack data. There is also a ranking for the top destination countries.
Deutsche Telekom – Security speedometer
The security speedometer shows the global cyber attacks on the honeypot infrastructure of DTAG and its partners.
Digital Attack Map
Visualized live data from global DDoS attacks. This was developed in collaboration between Google Ideas and Arbor Networks. The tool provides anonymous attack data that allows users to explore historical trends and retrieve reports of outages on a given day.