MIT CSAIL's AI detects possible IP address hijacking

Border gateway protocol (BGP), a routing protocol used to transfer data and information between different host gateways, is fundamental to the internet's design. Unfortunately, it's flawed in two respects: It lacks route authentication and basic origin validation. That makes BGP liable to cause connectivity issues in the event of misconfigurations, and worrisomely opens the door to malicious spammers, traffic interceptors, and cryptocurrency thieves.

That's why researchers at MIT's Computer Science and Artificial Intelligence Lab recently conducted a study of BGP activity over the course of five years, with the goal of identifying the dominant characteristics of hijackers and how they differ from legitimate systems. The work informed a set of metrics to which the team applied an AI algorithm to evaluate their accuracy in identifying hijackers' patterns.

"Our preliminary results suggest that [certain] patterns can be leveraged in automated applications, potentially revealing undetected behavior or generating a novel category of reputation scores," wrote the coauthors of the paper detailing the research. "Our findings have thus relevance for the operator community, since they can ... potentially [allow] for preventive defense. Our findings are also of relevance to the broader research community, since they provide viable input for new ... hijacking detection systems, as well as for the development of ... reputation metrics and scoring systems."

During a typical BGP hijack, a malicious actor fools nearby networks into routing data through a compromised system (or interconnected systems) to specific IP addresses (i.e., numbers that identify devices and allow them to exchange information among each other). A recent attack rerouted nearly 1,300 addresses to a facsimile of a popular cryptocurrency website, enabling the orchestrators to steal about $150,000 in digital coins from customers. In another high-profile incident, Google lost control of several million IP addresses, making its search and other services unavailable for over an hour.

Few automatic systems designed to detect illegitimate BGP route announcements exist, as the researchers explain in the paper. Most network operators rely on mailing lists or event-based schemes that track only single ongoing hijackings. By contrast, the team's policy leverages the fact that BGP behaviors are consistent over time to suss out potential events. (Indeed, some malicious autonomous systems show malicious activity for multiple years.)

The researchers extracted data from network operator mailing lists going back years, as well as historical BGP data recorded every five minutes from the global routing table. From this, they trained a machine learning model to identify key characteristics like volatile changes in activity and multiple address blocks. Hijackers' blocks usually disappear faster than those of legitimate networks, the model found, and malicious networks tend to advertise many more blocks of IP addresses (or network prefixes). Additionally, the model determined that those networks were much more likely to be registered in foreign countries and continents.

Compiling the model's training corpus wasn't exactly a walk in the park; identifying and discarding false positives posed a particular challenge. For example, network operators use BGP to defend against distributed denial-of-service attacks by modifying the route, which looks virtually identical to an actual hijack.

The team manually removed these and other false positives, which accounted for roughly 20% of cases spotted by the model. They say it managed to identify about 800 suspicious networks in all, including some that had been hijacking IP addresses for years. One actor, which the researchers refer to as "AS134190," began originating different prefixes for very short time periods (about a day) starting in early 2017, and it carried out as many as 11 attacks (one of which was identified by the widely known BGPmon hijack detection system) between November 2018 and early 2019.

"Network operators normally have to handle such incidents reactively and on a case-by-case basis, making it easy for cybercriminals to continue to thrive ... It's like a game of Telephone, where you know who your nearest neighbor is, but you don't know the neighbors 5 or 10 nodes away," said MIT graduate student and lead author Cecilia Testart, who leaves to future work models that require less human supervision and that could be deployed in production environments. "This is a key first step in being able to shed light on serial hijackers' behavior and proactively defend against their attacks."

Testart and colleagues will present the paper at the ACM Internet Measurement Conference in Amsterdam later this month. A dataset containing the list of suspicious networks flagged by the algorithm is publicly available on GitHub.

More