Citation link:
https://doi.org/10.26092/elib/1439
Blind source separation in single-channel polyphonic music recordings
File | Description | Size | Format | |
---|---|---|---|---|
DissertationSchulze.pdf | 9.08 MB | Adobe PDF | View/Open |
Authors: | Schulze, Sören | Supervisor: | King, Emily J. | 1. Expert: | King, Emily J. | Experts: | Dörfler, Monika | Abstract: | We address the problem of unmixing the contributions of multiple different musical instruments from a single-channel audio recording without any specific prior information. Based on a model for the sounds of string and wind instruments, every tone is represented using a set of model parameters as well as a learned dictionary matrix that captures relations of the amplitudes of the harmonics specific to each instrument. We propose two practical approaches that both operate on time-frequency representations derived from the short-time Fourier transform. The first approach is based on a specifically developed sparse pursuit algorithm. Since it needs to operate on a log-frequency spectrogram, we analyze the characteristics of such representations from a theoretical point of view and propose a log-frequency spectrogram that fulfills all the properties that we consider favorable. For use in the separation algorithm, it turns out that the best log-frequency spectrogram is obtained via the sparse pursuit algorithm itself. While discussing pursuit algorithms in general, we also sketch a potential application of Beurling LASSO on source separation. The second approach is an application of deep neural networks for the prediction of the model parameters. Since the problem is non-convex and possesses a large number of local minima, we combine conventional backpropagation with policy gradients which stem from reinforcement learning. This method is distinguished by its ability to operate directly on the Gabor frame analysis coefficients (i.e., the sampled complex-valued output of the short-time Fourier transform). On each of the samples that we gathered for evaluation, at least one of the approaches dominates the state of the art, respectively. The second algorithm can generally be considered better, especially in the suppression of interference between the sources. Unlike most traditional algorithms, neither of the methods is bound to any particular tuning of the instruments. They each possess different mechanisms to account for inconsistencies in the sounds of acoustic instruments, and they both incorporate inharmonicity in their parameter predictions. |
Keywords: | blind source separation; unmixing; time-frequency analysis; machine learning; dictionary learning; sparse pursuit; deep learning; neural networks; policy gradients; non-convex optimization | Issue Date: | 3-Feb-2022 | Type: | Dissertation | Secondary publication: | no | DOI: | 10.26092/elib/1439 | URN: | urn:nbn:de:gbv:46-elib58165 | Institution: | Universität Bremen | Faculty: | Fachbereich 03: Mathematik/Informatik (FB 03) |
Appears in Collections: | Dissertationen |
Page view(s)
413
checked on Jan 14, 2025
Download(s)
227
checked on Jan 14, 2025
Google ScholarTM
Check
This item is licensed under a Creative Commons License