The objective of the DARLING project is to propose new adaptive, distributed and collaborative learning methods on high-dimensional dynamic graphs in order to extract structured information from the data streams acquired or transiting at the nodes of these graphs. These methods are confronted with two state-of-the-art observation techniques operating at extreme scales: radio astronomy with the SKA instrument and brain imaging by magnetoencephalography.
Over the last 5 years, there has been a major and persistent interest for megadata processing, echoing a radical transformation of our information societies. Many applications involving megadata are structured by a network and require real-time actions due to their time-sensitive nature. The monitoring and management of transport, telecommunication, energy production and distribution networks are typical examples. Many scientific disciplines are also concerned, from Universe Sciences to Neurosciences. These systems consist of a large number of agents (sensors, processors, actuators, neurons) linked by a connection topology. These agents can eventually interact, in a dynamic way, in order to accomplish the task assigned to them. The data flows are massive and their properties are likely to evolve over time. The graphs are themselves dynamic. The agility required to analyze this information leads to the choice of a distributed solution on the nodes of the considered graph, compared to a centralized algorithm, and which is adaptive in order to follow the temporal evolutions of the monitored system, compared to a static solution.<br />The DARLING project contributes to the elaboration of a theoretical framework for the processing of temporal signals structured by dynamic graphs, and to the development of adaptive and online learning algorithms that can be deployed on a large scale by being distributable on the nodes of the considered graphs.
The DARLING project addresses three methodological issues.
The first one concerns data modeling. Graphs are versatile representations to capture the geometry of a natively structured dataset. The samples, possibly multidimensional, supported by their nodes are collectively called signal on graph. Examples of graph-based signals abound. In functional brain imaging, they allow to characterize the anatomical connectivity of distinct functional regions of the cortex. In radio astronomy, they allow the representation of interferometric data acquired by antennas spread over a continent for the reconstruction of images of the sky.
The second problem concerns the size of the graphs. It can indeed be a major obstacle to centralized data processing. For example, Wikipedia is an excellent source of information because of its exceptional scale and the tens of millions of users who leave their mark on it every day, creating a vast dynamic graph of interconnected visited pages. The appearance of anomalies in Wikipedia's page editing and visiting activity reveals interesting facts about how users react in response to news. This graph of nearly 6 million nodes still eludes comprehensive monitoring and requires narrowing down to specific topics of interest. The example of the SKA radio telescope is even more emblematic since it should total 2.5 million antennas spread over an area of 5000 kilometers in diameter. Fully operational in 2025, it should generate 1 exabyte of data per day and mobilize 1000 times the current internet traffic.
The third challenge concerns the temporality of the data. In addition to their volume, analysts are faced with new difficulties due to the time dimension On the one hand, while interactions on graphs could already be analyzed from multiple perspectives thanks to graph theory, a novelty lies in the presence of agents interacting dynamically with each other, and influencing their respective behaviors. On the other hand, some structured data streams require online and/or real-time analysis in order to adapt to time-varying dynamics and to meet the constraints of time-sensitive processes.
The project team has carried out work and obtained significant results in the development of new adaptive, distributed and collaborative learning methods on graphs.
Mainly, it has been interested so far in online graph topology inference, change detection on graphs, classification and clustering on graphs, and learning for attributed data on graphs.
Over the next few months, efforts will be focused as much as possible on applications in order to complete the already well advanced methodological component.
Large Dimensional Analysis and Improvement of Multi Task Learning. M. Tiomoko, et al. Journal of Machine Learning Research, 2020.
A unified framework for spectral clustering in sparse graphs. L. Dall'Amico, et al. Journal of Machine Learning Research, vol. 22, no. 187, pp. 1-56, 2022.
Nishimori meets Bethe: a spectral method for node classification in sparse weighted graphs. L. Dall'Amico, et al. Journal of Statistical Mechanics: theory and experiment, 2021.
Consistent Semi-Supervised Graph Regularization for High Dimensional Data. X. Mai et al. Journal of Machine Learning Research, vol. 22, no. 84, pp. 1-48, 2021.
Emergence of ß and ? networks following multisensory training. D. La Rocca, et al. Neuroimage. 2020 Feb 1;206:116313.
Transient performance analysis of the L1-RLS algorithm. W. Gao, et al. Signal Processing Letters, IEEE. 2021. Early Access
From time-frequency to vertex-frequency and back. L. Stankovic, et al. Mathematics, 9(12). 2021.
Graph topology inference with derivative-reproducing property in RKHS: algorithm and convergence analysis.
Transient theoretical analysis of diffusion RLS algorithm for cyclostationary colored inputs. W. Gao, et al. Signal Processing Letters, IEEE, 28: 1160-1164. 2021.
Convex combination of diffusion strategies over networks. D. Jin, et al. Information Processing over Networks, IEEE Transactions on, 6: 714-731. 2020.
Online proximal learning over multitask networks over jointly sparse multitask networks with L(infty,1) regularization. D. Jin, et al. Signal Processing, IEEE Transactions on, 68: 2087-2104. 2020.
Diffusion LMS with communication delays: Stability and performance analysis. F. Hua, et al. Signal Processing Letters, IEEE, 27: 730-734. 2020.
Learning over multitask graphs – Part I: Stability analysis. R. Nassif, et al. Signal Processing, IEEE Open Journal on, 1: 28-45. 2020.
Learning over multitask graphs – Part II: Performance analysis. R. Nassif, et al. Signal Processing, IEEE Open Journal on, 1(46-63). 2020.
Online distributed learning over graphs with multitask graph-filter models. F. Hua, et al. Signal and Information Processing over Networks, IEEE Transactions on, 6(1): 63-77. 2020.
Multitask learning over graphs: an approach for distributed, streaming machine learning. R. Nassif, et al. Signal Processing Magazine, IEEE, 37(3): 14-25. 2020.
Affine combination of diffusion strategies over networks. D. Jin, et al. Signal Processing, IEEE Transactions on, 68(1): 2087-2104. 2020.
Semi-automatic extraction of functional dynamic networks describing patient's epileptic seizures. G Frusque, et al. Frontiers in Neurology 11. 2020
Multiplex network inference with sparse tensor decomposition for functional connectivity. G Frusque, et al. IEEE transactions on Signal and Information Processing over Networks 6, 316-328, 2020.
Variational Graph Autoencoders for Multiview Canonical Correlation Analysis. Y Kaloga, et al. Signal Processing, 108182, 2021.
For the past 5 years, there has been a major and persistent interest for the treatment of big data, in response to a radical change in our information societies. Many applications involving these big data are structured by a network and require real-time actions given their chrono-sensitive constraints. Telecommunication networks and power grids monitoring are typical examples. Many scientific disciplines are also involved, from Sciences of the Universe to Neuroscience. These systems consist of a large number of agents linked by a connection topology. These agents can potentially interact dynamically to accomplish a task. Data flows are massive and their properties are likely to evolve over time. The graphs themselves are dynamic.
The aim of the DARLING project is to propose new adaptive, distributed and collaborative learning methods on large dynamic graphs in order to extract structured information from data flows acquired and / or transiting at the nodes of these graphs. To achieve these objectives, DARLING must address three methodological locks. The first lock concerns data modeling. Although graph signal processing has recently provided a complete set of analysis tools, its perspectives remain limited to generally static signal models whose temporal dimension has been neglected in favor of the spatial dimension. The second lock concerns the size of the graphs. The example of the SKA radio-telescope is emblematic since it should total 2.5 million antennas spread over an area from South Africa to Australia. For such situations, it is essential to develop processing and learning methods that support scalability by being natively distributed over the nodes. The third lock concerns the temporality of the data. Some data flows require online analysis to adapt to time-varying dynamics and respond to time-critical process constraints.
At the end of the project, the DARLING team plans to deliver a family of learning methods operating on temporal signals structured by dynamic graphs. These methods will be natively distributed on the nodes of the graphs, will operate in an online manner and will enjoy adaptive capabilities to meet temporal constraints. In order to obtain performance guarantees, these methods will be systematically accompanied by a thorough study with random matrix theory. This powerful tool, never used in this context although perfectly indicated for inference on random graphs, will offer new perspectives. Finally, these methods will be confronted with two state-of-the-art observation techniques in which two of the partners are involved and have data: radioastronomy with the giant instrument SKA (Obs Cote d'Azur) for images reconstruction and calibration, and magnetoencephalography neuro-imaging (NeuroSpin, CEA Saclay) for the characterization of anatomical connectivity of distinct functional regions of the cortex. Some of this data will be provided with the Python routines delivered at the end of the project.
Monsieur Cédric Richard (Laboratoire J-L. Lagrange)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
LAGRANGE Laboratoire J-L. Lagrange
LPENSL LABORATOIRE DE PHYSIQUE DE L'ENS DE LYON
DRF / JOLIOT / NeuroSpin Institut des sciences du vivant FRÉDÉRIC-JOLIOT
GIPSA-lab Grenoble Images Parole Signal Automatique
Help of the ANR 427,471 euros
Beginning and duration of the scientific project: January 2020 - 48 Months