Algorithms for Multi-Dimensional Data via Sketches – ADDS
ADDS: Algorithms for Multi-Dimensional Data via Sketches
Massive data presents particularly vexing challenges for algorithmic processing: not only are most of the commonly encountered problems NP-hard, one cannot afford to spend too much running time or space. Even worse, for approximation algorithms, the dependency on the dimension is often exponential. Overcoming these challenges require the development of a new generation of algorithms and techniques.
Effectively addressing challenges of processing massive data through the notion of sketches: computing a constant-sized subset of the input data that captures key aspects of the entire data.
One key in effectively addressing these challenges is through the notion of sketches: extract a small subset---ideally constant-sized subset---of the input data that captures, approximately with respect to a given parameter epsilon, key aspects of the entire data. Given a family of optimization problems, the goal is to construct sketches whose size is independent of the size of the input data, while minimizing dependence on the dimension and the approximation parameter epsilon.<br /><br />Sketches are related to more general succinct approximations of data, such as epsilon-nets and epsilon-approximations. An example of sketches are coresets, which are sketches such that solving the given problem on a coreset gives an approximate solution to the problem on the entire data. <br /><br />While great progress has been achieved for non-geometric problems, <br />for many fundamental problems on geometric data, the construction and existence of near-optimal sketches remain open. Our research is divided into three parts, requiring expertise in statistics, computational geometry, learning, combinatorics, and algorithms. First, we consider the combinatorial properties of geometric data that are relevant to build compact sketches. Second, we consider the time and space complexities of constructing accurate sketches of data in high dimensions, based on the combinatorial and geometric understanding. Finally, we show how to use the small sketches in order to improve the accuracy and running time of optimization algorithms.
The project can be divided into three tasks:
Task 1: The study of samples and succinct approximations of geometric data---including a better understanding of the discrete structure of geometric data in high dimensions; notions of sampling complexity; their statistical, geometric and combinatorial properties; dimension-independent structures in geometry; construction and algorithms for succinct approximations; role and limits of random sampling.
Task 2: The use of samples and succinct approximations to design improved sketches of geometric data---including core-sets for shape-fitting problems; kernels for extent-based problems; depth-estimation for shape-approximation; polytope approximation for approximate nearest-neighbors.
Task 3: The use of sketches to design effective approximation algorithms to optimization problems---including algorithms for data-depth in high dimensions, chance-based optimization, nearest-neighbour in non-Euclidean metrics; the efficient implementations of these algorithms; testing and evaluation; development of an open-source library.
Our work so far has resolved or improved upon a number of open problems:
-The work of Guilherme Dias da Fonseca and his colleagues achieves a major breakthrough by resolving the decades-old problem of approximating a closed, convex set K in Euclidean d-dimensional space by a convex polytope of low combinatorial complexity.
-The solutions to instances of hard problems of coordinated motion planning won the international challenge CG:SHOP2021 associated to the conference SoCG.
-In collaboration with Imre Bárány, we presented improved bounds for the depth of Radon points in Euclidean spaces.
-The multiplicative weights update technique was used to improve the running time of constructing low crossing matchings, improving the previous-best general bound from 1989.
Improvements in sketches could potentially lead to breakthroughs in many geometric optimization problems. Each one of these domains of geometric optimization is already interesting by itself but as these algorithms are the tools of scientific computations in many different areas, improving one of them implies impacting many different domains: Linear, bilinear or integer programming are, for instance, the computational core of combinatorial optimization and operations research; visibility computations are the computational core of rendering in computer graphics; nearest neighbor search has applications in DNA sequencing, chemical similarity, pattern recognition, and countless others.
Work on this project has already been published in the top venues for machine learning (JMLR), computational geometry (SCG, DCG), algorithms (SODA, TOC), and combinatorics (AMS: Contemporary Mathematics).
The algorithms have been implemented and made available publicly. The solutions to instances of hard problems of coordinated motion planning won the international challenge CG:SHOP2021 associated to the conference SCG.
Massive geometric data is increasingly common thanks to the proliferation of ubiquitous data-collecting devices, presenting vexing challenges for algorithmic processing. Our approach to deal with this amount of data is to, given an approximation parameter eps, construct a small-sized sketch S of the input data, then solve the problem on S, and finally extend this solution to a (1+eps)-approximation to the original problem. Our research is divided into three parts, requiring expertise in statistics, computational geometry, learning, combinatorics, and algorithms. First, we consider the combinatorial properties of geometric data that are relevant to build compact sketches. Second, we consider the time and space complexities of constructing accurate sketches of data in high dimensions, based on the combinatorial and geometric understanding. Finally, we show how to use the small sketches in order to improve the accuracy and running time of optimization algorithms.
Project coordination
Nabil Hassan Mustafa (Chambre de Commerce et d'Industrie régionale de Paris Ile-de-France - ESIEE Paris)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partner
LIMOS Laboratoire d'Informatique, de Modélisation et d'Optimisation des Systèmes
LIPN - USPN LABORATOIRE D'INFORMATIQUE PARIS NORD
CREST Centre de Recherche en Economie et Stastistique - CREST
ESIEE Paris Chambre de Commerce et d'Industrie régionale de Paris Ile-de-France - ESIEE Paris
Help of the ANR 420,119 euros
Beginning and duration of the scientific project:
December 2019
- 48 Months