record – LABS

Thesis Info

LABS ID: 00386

Thesis Title: Automatic Adaptation of Sound Analysis and Synthesis

Author: Marco Liuni

E-mail: leehooni AT gmail.com

2nd Author

3rd Author

Degree: Ph.D.

Year: 2012

Number of Pages: 110

University: Università di Firenze - UPMC Paris 6

Thesis Supervisor: Marco Romito - Xavier Rodet

Supervisor e-mail: romito@dm.unipi.it - xavier.rodet@ircam.fr

Other Supervisor(s): Axel Roebel

Language(s) of Thesis: English

Department / Discipline: Applied Mathematics - Sound Processing

Copyright Ownership: Author

Languages Familiar to Author: Italian, French, English

URL where full thesis can be found: articles.ircam.fr/textes/Liuni12a/index.pdf

Keywords: Frame theory, sparsity measures, sound processing, phase vocoder

Abstract: 200-500 words: Sound processing techniques are employed over a wide area of research and industrial applications: music first comes to mind, together with the community of composers, producers and multimedia artists as well as the professionals of entertainment; then we have speech, which is elaborated in many different ways in our everyday life. Smartphones, tablets and any other kind of mobile devices, as well as TVs and home theater set-ups, computers, digital equipment for music and film studios: all of them deal with sound in digital format and come with different and challenging needs, which rise many interesting research and technological issues. Several fields other than music or speech exploit sound analysis, transformation and synthesis: medical sciences, security instruments and communications, among others. Traditional sound analysis methods, based on single sets of atomic functions like Gabor windows or wavelets, offer limited possibilities concerning the flexibility of their time-frequency precision. Moreover, fundamental analysis parameters have to be set a-priori, according to the signal characteristics and the quality of the representation required. Analyses with a non-optimal resolution lead to a blurring, or sometimes even a loss of information about the original signal, which affects every kind of later treatment.This problem concerns a large part of the technical applications dealing with signals: visual representation, feature extraction and processing among others; the community working on these issues is a very broad one, including telecommunications, sound and image processing as well as applied mathematics and physics. Our main interest is focused on sounds, and our questions principally rise from the musical and voice domains. The mainstream industrial fields more strictly related to this topic are signal transformation, music production, speech processing, source separation and music information retrieval, the latter covering a broad range of applications from classification, to identification, feature extraction and information handling about music: many of the algorithms applied within these processes rely on a given time-frequency representation of the signal, inheriting its qualities and drawbacks, and would therefore benefit from adapted analyses with optimized resolutions. This motivates the research for adaptive methods, conducted at present in both the signal and the applied mathematics communities: they lead to the possibility of analyses whose resolution locally changes according to the signal features. This thesis starts from the main idea that algorithms based on adaptive represen- tations will help to establish a generalization and simplification for the application of signal processing methods that today still require expert knowledge. An automatic parameter selection would allow to achieve more robust methods with significantly less human effort. Our attention is focused in particular on advanced signal processing methods in applications designed for large communities: the need to provide manual low level configuration is indeed one of the main problems. The possibility to dispose of an automatic time frequency resolution drastically limits the parameters to set, without affecting, and even ameliorating, the analysis quality: the result is an improvement of the user experience with advanced signal processing techniques that require, at present, a high expertise. The first and fundamental objective of our project (Chapter 2) is thus the formal definition of mathematical models whose interpretation leads to theoretical and algorithmic methods for adaptive analysis. Gabor frames theory constitutes a very natural mathematical context: one of its main subjects is the definition of redundant sets of atoms in Hilbert spaces, generally larger than orthonormal bases, together with the associated decomposition operators and their inverses. Actually, using that for sound processing requires the possibility of reconstructing a signal from its analysis coefficients: thus we need an efficient way to find an inverse of the adaptive decomposition operator, together with appropriate methods to manage adaptive analyses in order to preserve and improve the existing sound transformation techniques. The second objective (Chapter 3) is to make this adaptation automatic; we aim to establish criteria to define the optimal local time-frequency resolution: we deduce such criteria from the optimization of given sparsity measures. We take into account both theoretical and application-oriented sparsity measures: entropies and other quantities borrowed from information theory and probability belong to the first class. When dealing with concrete sounds, information measures may not always be well-suited, since some of their characteristics do not find a direct interpretation in the signal domain. Thus, it is often useful to give application-driven definitions of sparsity, depending on the particular features that the system should privilege. This first chapter deals with the scientific and historical motivations of the work, while Chapters 4 and 5 present the algorithms that we have realized, together with a description of their properties, applications and results.