Automating Domain Squatting Detection Using Representation Learning

Pablo Loyola; Kugamoorthy Gajananan; Hirokuni Kitahara; Yuji Watanabe; Fumiko Satoh

doi:10.1109/BigData50022.2020.9377875

Big Data 2020

Conference paper

10 Dec 2020

Automating Domain Squatting Detection Using Representation Learning

View publication

Abstract

Registering altered domain names with the purpose of confusing users and conducting malicious activities is one of the most widespread types of attacks on the Web, conforming a family of techniques known as domain squatting. Detecting these domains is a difficult t ask, g iven t he l arge a mount of combinations and the massive and heterogeneous nature of the Web. In this work, we propose a set of models to firstly learn the distributional regularities from detected squatted domains, and from that, automatically generate realistic modified domains. Our goal is to proactively guide the generation of squatted domains towards malicious domains that exists but have not been detected yet. We conducted an empirical study for both typo-squatting and combo-squatting generation approaches against strong baselines on real world data, showing their feasibility and providing insights to support for proactive defense in the context of cloud security.

Conference paper