CASE STUDY
Needles in the Haystack
A Cyber-Intelligence capability for a global bank.
The Client
The client was a retail and investment bank with operations spanning the globe. As of 2022 it was one of the top 20 largest banks in the world by total assets. Unai was commissioned by the Global Information Security Directorate, headquartered in London, UK.
In the wake of a string of high-profile breaches among large global businesses - many of which had not been detected until well after the event- the bank had launched a major programme of work to develop a next-generation cyber-intelligence capability.
The idea was to direct all relevant data sources to a single data lake and use machine learning to identify suspicious events which could be forwarded to the bank's Cyber-Forensics teams for detailed investigation.
THE CHALLENGE
The client had deployed a large Hadoop cluster and developed an initial data ingestion capability. Events from email gateways, antivirus systems, ID card "swipe-stations", firewalls and web gateways were all now landing in the cluster.
The programme was now scaling, with multiple client and supplier teams collaborating to develop the platform.
Unai was asked to help design and develop the data pipelines and machine learning that would identify "signals" (suspicious events) amidst the "noise" (data from normal operations).
Three challenges faced the Unai team:
With very few data available containing examples of known malicious activity, supervised learning approaches were not going to be possible.
With limited human cyber-forensics capability, how could algorithms be developed that would not overwhelm them with "false positives"?
With 500 million events received daily, how could algorithms be designed to run efficiently?
What Unai Delivered
Data Transformation Pipelines. Unai used Apache Spark to develop transformations that could lift raw data from "bronze" into "silver" tables within the Hadoop data lake. The data in the "silver" tables had been modelled and prepared for specialist machine learning tasks.
Machine Learning Algorithms. Unai developed two machine learning models: one detected "phishing" attempts via email, the other detected malware operating in "command & control" mode. Both incorporated heuristics and prior knowledge which enabled them to be trained in an unsupervised manner. Both could be run in distributed mode via Apache Spark, enabling execution over "big data".
Graph Database. Finally, the Unai team worked with a client team on a graph database, implemented in HBase. "Signals" from the models were linked to affected entities in a knowledge graph. An API for cyber-forensics experts connected this system to their analysis tools. In this way clusters of suspicious alerts could be easily explored without overwhelming the analysts.
The Result
The platform was launched as the centrepiece of the bank's "Fusion Cell" capability in 2018. A leading cyber-security expert, presenting to an international conference, summed up the innovation as follows:
This capability enables a truly strategic view for the bank’s operations and has already led to new and enhanced functions, including cyberintelligence, insider threats, red teaming, threat hunting, cyberinnovation, and outreach.