Social network has become a very popular way for internet users to communicate and interact online. Users spend plenty of time on famous social networks (e.g., Facebook, Twitter, Sina Weibo, etc.), reading news, discussing events and posting messages. Unfortunately, this popularity also attracts a significant amount of spammers who continuously expose malicious behavior (e.g., post messages containing commercial URLs, following a larger amount of users, etc.), leading to great misunderstanding and inconvenience on users׳ social activities. In this paper, a supervised machine learning based solution is proposed for an effective spammer detection. The main procedure of the work is: first, collect a dataset from Sina Weibo including 30,116 users and more than 16 million messages. Then, construct a labeled dataset of users and manually classify users into spammers and non-spammers. Afterwards, extract a set of feature from message content and users׳ social behavior, and apply into SVM (Support Vector Machines) based spammer detection algorithm. The experiment shows that the proposed solution is capable to provide excellent performance with true positive rate of spammers and non-spammers reaching 99.1% and 99.9% respectively.
Title: Hunting for Spammers: Detecting Evolved Spammers on Twitter
Authors:Nour El-Mawass, Saad Alaboodi
(Submitted on 8 Dec 2015 (v1), last revised 15 Dec 2015 (this version, v2))
Abstract: Once an email problem, spam has nowadays branched into new territories with disruptive effects. In particular, spam has established itself over the recent years as a ubiquitous, annoying, and sometimes threatening aspect of online social networks. Due to its prevalent existence, many works have tackled spam on Twitter from different angles. Spam is, however, a moving target. The new generation of spammers on Twitter has evolved into online creatures that are not easily recognizable by old detection systems. With the strong tangled spamming community, automatic tweeting scripts, and the ability to massively create Twitter accounts with a negligible cost, spam on Twitter is becoming smarter, fuzzier and harder to detect. Our own analysis of spam content on Arabic trending hashtags in Saudi Arabia results in an estimate of about three quarters of the total generated content. This alarming rate makes the development of adaptive spam detection techniques a very real and pressing need. In this paper, we analyze the spam content of trending hashtags on Saudi Twitter, and assess the performance of previous spam detection systems on our recently gathered dataset. Due to the escalating manipulation that characterizes newer spamming accounts, simple manual labeling currently leads to inaccurate results. In order to get reliable ground-truth data, we propose an updated manual classification algorithm that avoids the deficiencies of older manual approaches. We also adapt the previously proposed features to respond to spammers evading techniques, and use these features to build a new data-driven detection system.
Submission historyFrom: Nour El-Mawass [view email]
[v1] Tue, 8 Dec 2015 18:21:31 GMT (987kb,D)
[v2] Tue, 15 Dec 2015 21:53:18 GMT (987kb,D)
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)