DS0707 - Interactions humain-machine, objets connectés, contenus numériques, données massives et connaissance

Fields: Finding Interesting Events in Large Dynamic Social Networks – FIELDS

Submission summary

A large amount of data is posted and shared incessantly by users of social media such as Twitter, Facebook, Google+. This presents unprecedented opportunities as it allows to study and understand complex systems, as well as, to be timely informed about recent events such as earthquakes, virus outbreaks or a new jazz concert in town. However, it poses non-trivial challenges due to the sheer size of such data, its rapid evolution over time and the fact that relevant information is often intertwined with noisy or non-interesting data. When events, such as the recent shooting at the offices of the Charlie Hebdo satirical magazine unfold, social media users engage in intense social activity by discussing and sharing relevant content in the form of text, pictures, etc. In Twitter, tweets containing terms such as ``Charlie Hebdo'', ``shooting'', ``Paris'' become relatively frequent, while interesting events have large diffusion across the network and spread much more quickly than less interesting facts. In Flickr (a website for photo sharing), users post pictures being tagged with terms such as ``Place de la Republique'', ``Je suis Charlie'', ``marche pour Charlie''. In Tumblr (a microblogging platform) bloggers add links to other blogs related to the event.

This project proposal aims at finding interesting events in an automatic fashion by analyzing the huge amount of data produced in social networks. Twitter, Flickr, Tumblr, are inherently dynamic with new tweets, pictures, links being frequently posted or removed. Therefore, besides the challenge of devising algorithms for finding relevant events while filtering out non-interesting information, we need to face non-trivial computational challenges. Traditional data mining and machine learning algorithms that have been successfully employed to analyze static medium-size datasets, need to be reinvented so as to process large amount of information evolving over time. Therefore, we need to carefully design dynamic algorithms that work incrementally or decrementally, that is, they do not have to recompute a new solution from scratch every time the input changes and they can be efficiently implemented in systems such as Spark, Storm, MapReduce or Pregel.

Project coordination

Mauro Sozio (Institut Mines Telecom)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

LTCI Telecom ParisTech Institut Mines Telecom

Help of the ANR 207,569 euros
Beginning and duration of the scientific project: September 2015 - 36 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter