Baredroid: Large-Scale analysis of android apps on real devices
Abstract
To protect Android users, researchers have been analyzing unknown, potentially-malicious applications by using systems based on emulators, such as the Google's Bouncer and Andrubis. Emulators are the go-to choice because of their convenience: They can scale horizontally over multiple hosts, and can be reverted to a known, clean state in a matter of seconds. Emulators, however, are fundamentally different from real devices, and previous research has shown how it is possible to automatically develop heuristics to identify an emulated environment, ranging from simple flag checks and unrealistic sensor input, to fingerprinting the hypervisor's handling of basic blocks of instructions. Aware of this aspect, malware authors are starting to exploit this fundamental weakness to evade current detection systems. Unfortunately, analyzing apps directly on bare metal at scale has been so far unfeasible, because the time to restore a device to a clean snapshot is prohibitive: with the same budget, one can analyze an order of magnitude less apps on a physical device than on an emulator. In this paper, we propose BareDroid, a system that makes bare-metal analysis of Android apps feasible by quickly restoring real devices to a clean snapshot. We show how BareDroid is not detected as an emulated analysis environment by emulator-aware malware or by heuristics from prior research, allowing BareDroid to observe more potentially malicious activity generated by apps. Moreover, we provide a cost analysis, which shows that replacing emulators with BareDroid requires a financial investment of less than twice the cost of the servers that would be running the emulators. Finally, we release BareDroid as an open source project, in the hope it can be useful to other researchers to strengthen their analysis systems.