As agent-based social simulation is gaining ground as a candidate of choice for building decision-support tools in the management of complex socio-environmental systems, and as the resulting models are therefore being driven to produce more realistic outcomes, the issue of integrating large data corpuses (demographical, environmental, geographical…) as an input to these models becomes more critical.
In such models, the evolution of simulations is partly being driven by social agents (which may represent individuals, households or institutions), whose behaviors are strongly determined by their attributes, connections with other agents, but also location in the artificial worlds they populate. Nowadays data becomes available (partly thanks to the Big Data and Open Data initiatives), generating synthetic populations of agents that conform to the data available on real populations becomes a necessity and a concern for most social modelers, and although several approaches have been undertaken in recent works, it constitutes a significant scientific and methodological challenge that we plan to address in this proposal.
The first challenge concerns the conceptual and operational handling of scales, both the scale at which populations need to be generated and the scales at which data is available (or not). The ratio between these two scales will determine the use of robust and innovative up-scaling and down-scaling methods. Moreover, generating millions of agents or just a few hundreds is likely to involve completely different operational methods.
The second challenge is a consequence of the previous one and requires building an adaptive tool. The necessity to couple several conceptual and operational methods has to rely on a complete understanding, description (and documentation) of their aims and means, so that users can choose the most adapted with respect to their needs, but should also be left to decide what is the most appropriate, given a scenario of generation (objectives, scales, data available, computational power available, time required, etc.).
The Gen* project aims at combining applied mathematics and computer science approaches in order to incorporate arbitrary data and to generate statistically valid populations of artificial agents. Generic methods applicable to different use cases (e.g. urban/rural populations, downscaling needs…) will be provided and implemented in R and Java in order to be integrated as open-source libraries in existing agent-based simulation platforms. This offer will of course be completed with a methodological guide and several tutorials in order for end-users to master the methods and their combinations.
Finally, a standalone, industrial-grade, application will also be developed independently from existing platforms, which will support a dedicated graphical user interface that will enable modelers to design, save and reuse workflows linking the different data sources, methods from the library and data converters (akin to what the Kepler platform offers) so as to generate artificial populations.
All these developments will be undertaken by the consortium of the project, which has already a solid experience in managing large-scale open-source methodological projects dedicated to agent-based simulation. They will be primarily validated on several case studies provided by the partners, sufficiently diverse to enable the design of generic methods.
In addition, our goal is to build a strong community of users and developers during the 42 months of the project. Outcomes (libraries, tools, documentation) will be delivered at regular intervals under an open source license; feedback from users (and validation on their case studies) will be sought during dedicated workshops; and we will use all means of dissemination to provide incentives to the international agent-based social simulation community to adopt, develop and improve the Gen* libraries.
Monsieur Alexis DROGOUL (Unité de Modélisation Mathématique et Informatique des Systèmes Complexes)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
IDEES Identités et Différentiations de l'Environnement des Espaces et des Sociétés
IRD Unité de Modélisation Mathématique et Informatique des Systèmes Complexes
IRIT Institut de Recherche en Informatique de Toulouse
Help of the ANR 561,393 euros
Beginning and duration of the scientific project: August 2013 - 42 Months