Skip navigation
SuUB logo
DSpace logo

  • Home
  • Institutions
    • University of Bremen
    • City University of Applied Sciences
    • Bremerhaven University of Applied Sciences
  • Sign on to:
    • My Media
    • Receive email
      updates
    • Edit Account details

Citation link: https://doi.org/10.26092/elib/1439
DissertationSchulze.pdf
OpenAccess
 
by 4.0

Blind source separation in single-channel polyphonic music recordings


File Description SizeFormat
DissertationSchulze.pdf9.08 MBAdobe PDFView/Open
Authors: Schulze, Sören  
Supervisor: King, Emily J. 
1. Expert: King, Emily J. 
Experts: Dörfler, Monika  
Abstract: 
We address the problem of unmixing the contributions of multiple different musical instruments from a single-channel audio recording without any specific prior information. Based on a model for the sounds of string and wind instruments, every tone is represented using a set of model parameters as well as a learned dictionary matrix that captures relations of the amplitudes of the harmonics specific to each instrument.
We propose two practical approaches that both operate on time-frequency representations derived from the short-time Fourier transform. The first approach is based on a specifically developed sparse pursuit algorithm. Since it needs to operate on a log-frequency spectrogram, we analyze the characteristics of such representations from a theoretical point of view and propose a log-frequency spectrogram that fulfills all the properties that we consider favorable. For use in the separation algorithm, it turns out that the best log-frequency spectrogram is obtained via the sparse pursuit algorithm itself. While discussing pursuit algorithms in general, we also sketch a potential application of Beurling LASSO on source separation.
The second approach is an application of deep neural networks for the prediction of the model parameters. Since the problem is non-convex and possesses a large number of local minima, we combine conventional backpropagation with policy gradients which stem from reinforcement learning. This method is distinguished by its ability to operate directly on the Gabor frame analysis coefficients (i.e., the sampled complex-valued output of the short-time Fourier transform).
On each of the samples that we gathered for evaluation, at least one of the approaches dominates the state of the art, respectively. The second algorithm can generally be considered better, especially in the suppression of interference between the sources. Unlike most traditional algorithms, neither of the methods is bound to any particular tuning of the instruments. They each possess different mechanisms to account for inconsistencies in the sounds of acoustic instruments, and they both incorporate inharmonicity in their parameter predictions.
Keywords: blind source separation; unmixing; time-frequency analysis; machine learning; dictionary learning; sparse pursuit; deep learning; neural networks; policy gradients; non-convex optimization
Issue Date: 3-Feb-2022
Type: Dissertation
DOI: 10.26092/elib/1439
URN: urn:nbn:de:gbv:46-elib58165
Institution: Universität Bremen 
Faculty: Fachbereich 03: Mathematik/Informatik (FB 03) 
Appears in Collections:Dissertationen

  

Page view(s)

59
checked on May 28, 2022

Download(s)

28
checked on May 28, 2022

Google ScholarTM

Check


This item is licensed under a Creative Commons License Creative Commons

Legal notice -Feedback -Data privacy
Media - Extension maintained and optimized by Logo 4SCIENCE