JCJC SIMI 3 - JCJC - SIMI 3 - Matériels et logiciels pour les systèmes et les communications

Hierarchical Object-based Unsupervised Learning for Computational Auditory Scene Analysis – Houle

Hierarchical Object based Unsupervised Learning

This project studies a new hierarchical learning approach for leveraging several aspects of the so called Computational Auditory Scene Analysis (CASA) problem.

Tackling the CASA probkem in an unservised way

Despite recent advances in machine perception, largely due to advances in machine learning techniques, much remains to be done in the field of Computational Auditory Scene Analysis (CASA), which aims at bringing mechanisms of the human auditory system to machine perception by automatically understanding an auditory scene through<br />identification and description of sound sources. CASA is today an<br />active research field at the intersection of audio processing and<br />machine learning, with applications. The current roadblocks are that:<br />(1) few assumptions can be made regarding the potential objects of<br />interest, so building a model for each is not an option, (2) objects<br />cannot be observed in isolation, (3) they are structured by many<br />relationships whose priority is difficult to define a priori.<br /><br />In this project, we aim at proposing novel unsupervised learning<br />techniques to tackle real-world CASA problems. Our approach tries to<br />benefit from the very problematic schemes of existing CASA approaches: first, to tackle the intrinsic hierarchical structure of audio scenes (atoms gather into sound objects that are instances of classes such as «Piano C4 notes«, itself an example of «Piano note«); and second, to benefit from the redundancy present at many levels of the hierarchy. This redundancy lets us identify templates upon which to build a robust and meaningful representation.

We propose a specifically tailored unsupervised learning system
to challenge data with intrinsic hierarchy and redundant relational
information, structured into two components: Multi-Level Clustering
(MLC) performs the actual data analysis, and a «Supervisor« (reflexive
adaptation module) implements the learning aspect of the system by
tuning the operation of MLC, and managing a memory of previous runs.

MLC is the combination of (1) a series of basic clustering modules
such as kernel k-means running «on top« of one another to produce
greater and greater abstractions from the base atoms, and (2) a
feedback mechanism through which knowledge gained at higher
abstraction levels is used to guide the clustering at lower
levels. Thus, high-level data redundancy helps gather atoms into
intelligible structures. MLC operates on a differential representation
of data using several matrices (kernels) each reflecting a particular
predefined relationship between elementary units. Each level of the
MLC runs upon a distinct data representation, obtained by a weighted
average of the available kernels. The choice of those weights defines
the semantic nature of the objects produced by clustering at that
level.

A remplir

A remplir

A remplir

Despite recent advances in machine perception, largely due to advances
in machine learning techniques, much remains to be done in the
field of Computational Auditory Scene Analysis (CASA), which aims at
bringing mechanisms of the human auditory system to machine perception
by automatically understanding an auditory scene through
identification and description of sound sources. CASA is today an
active research field at the intersection of audio processing and
machine learning, with applications. The current roadblocks are that:
(1) few assumptions can be made regarding the potential objects of
interest, so building a model for each is not an option, (2) objects
cannot be observed in isolation, (3) they are structured by many
relationships whose priority is difficult to define a priori.

In this project, we aim at proposing novel unsupervised learning
techniques to tackle real-world CASA problems. Our approach tries to
benefit from the very problematic schemes of existing CASA approaches:
first, to tackle the intrinsic hierarchical structure of audio scenes
(atoms gather into sound objects that are instances of classes such as
"Piano C4 notes", itself an example of "Piano note"); and second, to
benefit from the redundancy present at many levels of the hierarchy. This redundancy lets us identify
templates upon which to build a robust and meaningful representation.

We thus propose a specifically tailored unsupervised learning system
to challenge data with intrinsic hierarchy and redundant relational
information, structured into two components: Multi-Level Clustering
(MLC) performs the actual data analysis, and a "Supervisor" (reflexive
adaptation module) implements the learning aspect of the system by
tuning the operation of MLC, and managing a memory of previous runs.

MLC is the combination of (1) a series of basic clustering modules
such as kernel k-means running "on top" of one another to produce
greater and greater abstractions from the base atoms, and (2) a
feedback mechanism through which knowledge gained at higher
abstraction levels is used to guide the clustering at lower
levels. Thus, high-level data redundancy helps gather atoms into
intelligible structures. MLC operates on a differential representation
of data using several matrices (kernels) each reflecting a particular
predefined relationship between elementary units. Each level of the
MLC runs upon a distinct data representation, obtained by a weighted
average of the available kernels. The choice of those weights defines
the semantic nature of the objects produced by clustering at that
level.

It is the Supervisor's role to automatically adjust those weights to
obtain an optimal clustering based (1) on properties of the data
themselves, considering intrinsic clustering quality metrics, (2) on
the memory of previously seen data and executions. The produced objects
shall be described in internal relational terms, by the relationships
between atoms composing them. The expected gains from this are a greater expressivity
and an easier comparison between objects thanks to a lower dependency
on accidental instance characteristics.

In summary, the originality of our proposal lies in its departure from common CASA
approaches, starting with the paradigm of scene and object
representation; its novelty is the original clustering algorithms
which we will refine during the project, whose applications go beyond
the scope of CASA. Our use of memory as a self-organizing guide truly
justifies the term of "unsupervised learning". Evaluation will be
conducted on standard CASA test cases, as well as through the
application of the system to two concrete problems. The project brings in several
young actors with complementary expertise in audio processing, machine
learning and music perception, and we hope to contribute to each field
on top of the federating novel CASA system.

Project coordination

Mathieu LAGRANGE (INSTITUT DE RECHERCHE ET DE COORDINATION ACOUSTIQUE-MUSIQUE ( IRCAM )) – mathieu.lagrange@ls2n.fr

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

IRCAM STMS (UMR 9912) INSTITUT DE RECHERCHE ET DE COORDINATION ACOUSTIQUE-MUSIQUE ( IRCAM )

Help of the ANR 222,000 euros
Beginning and duration of the scientific project: September 2011 - 36 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter