Machine Learning: Sophos relies on artificial intelligence
Intercept X 2.0 is about to be released. The beta phase has now been running for a long time and the product is expected to be released this month (January 2018) as a free update. This article is not yet about Intercept X 2.0, but rather about a deeper insight into the technology it contains.
At present, machine learning is on everyone’s lips, as it was VR (Virtual Reality) and AR (Augmented Reality) before. Marketing often sells it as A. I. for Artificial Intelligence. Whether these are chat bots, virtual assistants, autonomous driving cars, translation tools or smartphones - according to the manufacturers, AI is everywhere in it, which should simplify our life and make the product more intelligent.
The top smartphones are currently equipped with AI chips (Neural Processing Unit (NPUs) for pattern recognition and analysis. Imagine where it’s going, and everyone will soon have a small supercomputer in their pocket that’s almost always online. The decentralized Internet of Richard Hendriks from the TV series “Silicon Valley” will soon become reality. Such projects already exist on the block chain for decentralized storage or distributed computing power.
At this point a very clear recommendation for the TV series Silicon Valley! 😉
But back to the subject. Everybody advertises with AI, but not everywhere where AI is written is really AI. The differences are enormous. There is nothing else behind the term AI but machine learning, which Sophos also relies on with Intercept X 2.0.
What exactly is “Machine Learning”?
Machine Learning is briefly and accurately described on Wikipedia as follows:
“Machine Learning is a generic term for the “artificial” generation of knowledge from experience: an artificial system learns from examples and can generalise these after the end of the learning phase. This means that the examples are not simply memorized, but that it “recognizes” patterns and laws in the learning data. In this way, the system can also assess unknown data (learning transfer) or fail to learn unknown data (over-adaptation).”
In fact, Sophos relies on Deep Learning, an advanced form of machine learning.
Well, I think you can imagine the great impact machine learning can have on a product like Intercept X by reading the Wikipedia explanation above. We should all know that signature-based virus scanners have not been crucial for the detection of viruses since 2005, as only known and detected malware can be fought against. So it is always a battle between malware programmers and signature writers. Malware programmers naturally always have a short lead, which means that the malware is always known for a certain amount of time. As soon as the new malware is known, a slight adjustment of the program is sufficient to make it “unknown” for virus scanners.
Sophos has already developed many alternative methods for detecting malware and is no longer relying on signatures. But nevertheless, it was not a bad move to take over Invincea at the beginning of 2017 and to introduce a technology into the products, which is able to protect against future threats and thus also unknown malware.
Machine Learning is not completely new. The algorithms have been around since the 1980s and that hasn’t changed much. But up to now, you didn’t have big data and the processor power to do it. Machine Learning therefore experienced its revival around 2012. The same applies to Genetic Algorithms, which I believe will be used by malware writers in the future.
How does machine learning work in theory?
Quite simply put, you feed the machine with a lot of data. The algorithm takes them apart and analyzes the characteristics of the files. This can be e. g. the file size, but also more complex features like whole parts of the code. So after this process, you don’t just have a hash value like in signature-based recognition, but a lot of clues. Therefore, a small adjustment of the code is no longer sufficient to disguise itself as completely new malware, since other features would remain the same.
If you now have the characteristics, you start to work out so-called “models”. This requires a lot of data. It is therefore only fitting that more than 390,000 new malware programs, i. e. more than 16,000 per hour, appear every day. Sophos Sandstorm or Intercept X, in which data is transferred to Sophos Labs, can also help collect data and train the model. Malicious URLs or spam also provide learning material. Not only malware, but also good files are needed to prevent the detection of false positives.
You test several different models at the same time and take the one that delivers the best results. The model and the properties create a pattern of what malware should look like and how it differs from a good file. These patterns then allow you to evaluate files and calculate the probability of whether they are malware. This all happens within milliseconds and requires much less power (CPU and RAM) than other analytical methods. With an update, only pattern recognition is improved and not new signatures are loaded every x-seconds as with signature-based recognition.
If you’d like to delve a little deeper into the subject, you can read Sophos’s article on the subject: Sophos Machine Learning how to build a better threat detection Model
The PDF is in English, but can be translated into many other languages using DeepL Machine Learning: https://www.deepl.com/translator Of course, the well-known Google Translator also uses Machine Learning, but DeepL was fed with better data and the machine was noticeably better trained.
Machine Learning alone is not enough
Machine Learning can already achieve unbelievably good recognition rates and the advantages over signature-based recognition are obvious. However, Sophos relies not only on these new patterns alone, but also uses machine learning as a further technology to achieve the most complete malware detection possible.
Thanks to Machine Learning, Intercept X 2.0 will be able to help detect Ransomware and exploits even more effectively, complementing other technologies such as Exploit Prevention, Malicious Traffic Detection, CryptoGuard and Synchronized Security Heartbeat. Exactly with these additional technologies, the chaff separates from the wheat or, to put it another way, the Standard Antivirus from a professional solution.
Is Intercept X sufficient as the only protection?
One might wonder whether the normal anti-virus hasn’t run out of time, if you have installed Intercept X with these great technologies and now with machine learning in the future. If you are using the Sophos Endpoint Client, it is essential that it continues to run parallel with Intercept X. The reason for this is that the Sophos Endpoint Client is much more than just an ordinary anti-virus solution that detects malware using signatures. The Sophos Endpoint Client can perform web security, web control / category-based URL filtering, device control or application control, to name only a few. A complete overview of the differences between Sophos Endpoint Protection and Intercept X can be found in this data sheet.
For all other “classic” anti-virus programs, I don’t see any use for the future. At the moment, however, there is no reason not to run the antivirus in parallel with Intercept X.
More about Machine Learning
- Defining the truth: how Sophos overcomes uncertain labels in machine learning
- Man vs machine: comparing artificial and biological neural networks
- 5 questions to ask about machine learning
- Demystifying deep learning: how Sophos builds machine learning models
On the Sophos Labs site, there are some great real-time statistics on daily spam and malware activity generated from a lot of data.
For all those who like such real-time data, we have collected a few links. We ourselves also find it impressive to see how many attacks really happen out there. It’s crazy what goes on behind the scenes:
Norse Attack Map
Here, cyber attacks are displayed in real time. In addition, there are 8 million sensors and more than 6000 applications on servers in 40 countries, so-called honeypots, which are virtual traps. This all adds up to over 7 petabytes of attack data.
Norse maintains the world’s largest dedicated threat intelligence network. With over eight million sensors that emulate over six thousand applications – from Apple laptops, to ATM machines, to critical infrastructure systems, to closed-circuit TV cameras - the Norse Intelligence Network gathers data on who the attackers are and what they’re after. Norse delivers that data through the Norse Appliance, which pre-emptively blocks attacks and improves your overall security ROI, and the Norse Intelligence Service, which provides professional continuous threat monitoring for large networks.
FireEye - Cyber Threat Map
The FireEye Cyber Threat Map shows a daily summary of all global DDoS attacks.
- Top 5 of the reported industries
Top attackers by country
- FireEye Cyber Threat Map
Kaspersky - Cyber Map
Cyberthreat real-time map from Kaspersky shows real-time attacks detected by its various source systems.
- On-scanner access
- On-demand scanner
- Web Antivirus
- Mail Antivirus
- Burglar alarm system
- Vulnerability scan
- Kaspersky Anti-Spam
Botnet activity detection
- Kaspersky Cyberthreat real-time map
Akamai - Real-Time Web Monitor
Akamai monitors global Internet conditions around the clock. Using this real-time data, they identify the global regions where the largest web attack traffic occurs, cities with the slowest web connections (latency) and geographical areas with the highest web traffic (traffic density).
Checkpoint - Live Cyber Attack Threat Map
Check Point’s Threat Cloud also shows attack data. There is also a ranking for the top target countries.
Trendmicro - Global Botnet Threat Activity Map
This map also shows malicious network activity around the world.
German Telekom - Safety tachometer
The security tachometer shows the worldwide cyber attacks on the honeypot infrastructure of DTAG and its partners.
Digital Attack Map
Visualized live data of global DDoS attacks. This was developed in collaboration between Google Ideas and Arbor Networks. The tool provides anonymous attack data that allows users to explore historical trends and retrieve reports of outages on a particular day.