Scalable distributed first story detection using storm for twitter data

Mahesh G. Huddar, Manjula M. Ramannavar, Nandini S. Sidnal

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

Twitter is an online service that enables users to read and post tweets; thereby providing a wealth of information regarding breaking news stories. The problem of First Story Detection is to identify first stories about different events from streaming documents. The Locality sensitive hashing algorithm is the traditional approach used for First Story Detection. The documents have a high degree of lexical variation which makes First Story Detection a very difficult task. This work uses Twitter as the data source to address the problem of real-time First Story Detection. As twitter data contains a lot of spam, we built a dictionary of words to remove spam from the tweets. Further since the Twitter streaming data rate is high, we cannot use traditional Locality sensitive hashing algorithm to detect the first stories. We modify the Locality sensitive hashing algorithm to overcome this limitation while maintaining reasonable accuracy with improved performance. Also, we use Storm distributed platform, so that the system benefits from the robustness, scalability and efficiency that this framework offers.

Original languageEnglish
Title of host publication2014 International Conference on Advances in Engineering and Technology Research, ICAETR 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781479963935
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event2014 International Conference on Advances in Engineering and Technology Research, ICAETR 2014 - Unnao, India
Duration: 1 Aug 20142 Aug 2014

Publication series

Name2014 International Conference on Advances in Engineering and Technology Research, ICAETR 2014

Conference

Conference2014 International Conference on Advances in Engineering and Technology Research, ICAETR 2014
Country/TerritoryIndia
CityUnnao
Period1/08/142/08/14

Keywords

  • Distributed platform
  • Efficiency
  • First Story Detection (FSD)
  • Lexical variation
  • Robustness
  • Scalability
  • Storm

Fingerprint

Dive into the research topics of 'Scalable distributed first story detection using storm for twitter data'. Together they form a unique fingerprint.

Cite this