Web-based malware is a growing threat to today’s Internet security. Such attack is prevalent and leads to serious security consequences. Millions of malicious URLs are used as distribution channels to propagate malware all over the Web. After being infected, victim systems fall in the control of attackers, who can utilize them for various cyber crimes such as stealing credentials, spamming, and distributed denial-of-service attacks. Moreover, it has been observed that traditional security technologies such as firewalls and intrusion detection systems have only limited capabilities to mitigate this new problem.
In this paper, we survey the state-of-the-art research regarding the analysis of - and defense against - web-based malware attacks. First, we study the attack model, the root-cause, and the vulnerabilities that enable them. Second, we analyze the status quo of the web-based malware problem. Third, three categories of defense mechanisms are discussed in detail: (1) building honeypots with virtual machines or signature-based detection system to discover existing threats; (2) using code analysis and testing techniques to identify the vulnerabilities of web applications; and (3) constructing reputation-based blacklists or smart sandbox systems to protect end users from attacks. We show that these three categories of approaches form an extensive solution space to the web-based malware problem. Finally, we compare the surveyed approaches and discuss possible future research directions.