Professionally-produced music recordings
The purpose of this task is to evaluate source separation algorithms for estimating one or more sources from a set of mixtures in the context of professionally-produced music recordings.
The data set consists of a total of 100 full-track songs of different styles and includes the synthesized mixtures and the original sources, divided between a development subset and a test subset (see I.).
The participants are kindly asked to download the data set and an evaluation function, run the evaluation function which will call their separation function and return the estimated sources and the performance results, and send us back the performance results (see II.).
The evaluation will be performed along with the separation using a common set of performance measures, and the final results will be made available on the website after scaling them according to a lower-bound and/or an upper-bound criterion (see III.).
I. The Dataset
The Mixing Secret Dataset 100 (MSD100) consists of a total of 100 songs of different styles (see msd100.xlsx for more information).
MSD100 contains two folders, a folder with the mixture set, “Mixtures,” and a folder with the source set, “Sources.”
Each folder contains two subfolders, a folder with a development set, “Dev,” and folder with a test set, “Test“;
supervised approaches should be trained on the former set and tested on both sets.
Each subfolder contains 50 sub subfolders corresponding to 50 songs, for a total of 100 different songs.
Each sub subfolder from “Mixtures” contains one file, “mixture.wav,” corresponding to the mixture, and each sub subfolder from “Sources” contains 4 files, “bass.wav,” “drums.wav,”other.wav” (i.e., the other instruments), and “vocals.wav,” corresponding to the sources.
For a same song, the mixture and the sources have the same length and the same sampling frequency (i.e., 44,100 Hz); however, while the mixture is always stereophonic, one or more sources can be monophonic (typically, the vocals).
The sources for MSD100 were created from stems downloaded from The ‘Mixing Secrets’ Free Multitrack Download Library.
We would like to thank Mike Senior, not only for giving us the permission to use this multitrack material, but also for maintaining such resources for the audio community.
II. The Settings
The participants are kindly asked to first download the data set MSD100.zip (12 GB!) and unzip it in a root folder, so that there is a folder “MSD100” containing the subfolders “Mixtures” and “Sources,” along with the files “msd100.txt” and “msd100.xlsx” (make sure that you do not have a folder “MSD100” containing another folder “MSD100”).
The participants are also kindly asked to download the evaluation function sisec2015_mus.m and place it in the root folder, along with the folder “MSD100.”
We replaced the dataset MSD100.zip to MSD100_2.zip because some songs in the older version were corrupted (the songs stop playing before the end). Please download them again from the following links, then rename the dataset as MSD100.
The evaluation function will loop over the mixtures of the MSD100 data set, and call the participants’ separation function to estimate the sources.
To do so, the participants are kindly asked to name their separation function “myfunction.m” and place it in the root folder, along with the folder “MSD100” and the file “sisec2015_mus.m.”
The separation function should have the following syntax:
[bass, drums, other, vocals, accompaniment] = myfunction(mixture, fs);
where “mixture” is a matrix of size [#samples, #channels] corresponding to the mixture, “fs” is the corresponding sampling frequency in Hz, and “bass,” “drums,” “other,” “vocals,” and “accompaniment” are matrices of same size as the mixture corresponding to the estimates, i.e., the bass, the drums, the other instruments, the vocals, and the full accompaniment (i.e., bass+drums+other), respectively.
If one or more sources are not meant to be estimated, the separation function should return an empty matrix (i.e., ).
Any other parameter of the algorithm should be defined internally.
For example, if you are intending to perform vocals/accompaniment separation only, please define the bass, the drums, and other as empty matrices in the separation function, as follows:
bass = ;
drums = ;
other = ;
On the other hand, if you are intending to estimate, for example, the bass only, please define the other outputs as empty matrices in the separation function, as follows:
drums = ;
other = ;
vocals = ;
accompaniment = ;
Additionally, if you are intending to estimate the drums, the bass, other, and the vocals, for example, please note that you can also define the accompaniment as the sum of the bass, the drums, and other in the separation function, as follows:
accompaniment = bass + drums + other;
Finally, if you are intending to estimate the vocals only, please note that you can also define the accompaniment as the difference between the mixture and the vocals in the separation function, with the bass, the drums, and other defined as empty matrices, as follows:
bass = ;
drums = ;
other = ;
accompaniment = mixture – vocals;
III. The Evaluation
The participants are then kindly asked to run the evaluation function, simply as follows:
The evaluation function loops over all the 100 songs of the MSD100 data set, and, for each song, for both the development subset in the folder “Dev” and the test subset in the folder “Test,” reads the mixture “mixture.wav” from the folder “Mixtures,” performs source separation on it using the separation function “myfunction.m,” and writes the estimates as “bass.wav,” “drums.wav,” “other.wav,” and “vocals.wav” (if estimated) to the folder “Estimates” (make sure that the estimates have the same size as the mixture).
Again, supervised approaches should be trained on the development set and will be tested on both the development and test sets.
The evaluation function then reads the corresponding sources from the folder “Sources,” measures separation performance using them and the BSS Eval toolbox 3.0 (included in the evaluation function), and saves the performance measures in the file “results.mat,” including the song name and the processing time, along with the estimates to the folder “Estimates“.
Please, note that measuring the separation performance alone will take around a whole day, as there are 100 songs to process and 4 sources per song to evaluate.
The BSS Eval toolbox 3.0 consists of a set of measures that intend to quantify the quality of the separation between a source and its estimate.
The principle is to decompose an estimate into contributions corresponding to the target source, interference from unwanted sources, and artifacts such as “musical noise.”
Based on this principle, the following measures are then defined (in dB):
- Signal to Distortion Ratio (SDR)
- Source Image to Spatial distortion Ratio (ISR)
- Source to Interference Ratio (SIR)
- Sources to Artifacts Ratio (SAR) (see also BSS eval)
We would like to thank Emmanuel Vincent for giving us the permission to use the BSS Eval toolbox 3.0.
- Emmanuel Vincent, Shoko Araki, Fabian J. Theis, Guido Nolte, Pau Bofill, Hiroshi Sawada, Alexey Ozerov, B. Vikrham Gowreesunker, Dominik Lutter and Ngoc Q.K. Duong, “The Signal Separation Evaluation Campaign (2007-2010): Achievements and remaining challenges”, Signal Processing, 92, pp.1928-1936, 2012.
- Emmanuel Vincent, Hiroshi Sawada, Pau Bofill, Shoji Makino and Justinian P. Rosca, “First stereo audio source separation evaluation campaign: data, algorithms and results,” In Proc. Int. Conf. on Independent Component Analysis and Blind Source Separation (ICA), pp.552-559, 2007.
The performance measures, SDR, ISR, SIR, and SAR, are actually computed on segments of 30 seconds with 50% overlap.
The results are saved in a structure, as follows:
song_name = results.name
performance_measure = results.(“source name”).(“measure name”)
processing_time = results.time
For example, results.bass.sdr represents the SDR’s for all the 30 second half-overlapping segments of a given song.
The evaluation function also saves the results for all the songs, for both the development and test subsets, in a single file “result.mat” to the root folder, along with the folder “MSD100,” and the files “sisec2015_mus.m” and “myfunction.m“.
The participants are finally kindly asked to send us back the file “result.mat.”
The results will eventually be scaled between 0 and 1, where 0 corresponds to a lower-bound on the separation performance measured comparing the actual mixture to the original sources, and 1 corresponds to an upper-bound on the separation performance measured comparing the ideal estimates (computed using ideal soft masks) to the original sources.
Once all the participants have sent their results, they will be made available on the website.
The participants are kindly asked not to delete their separation results, i.e. the full arborescence containing all the separated tracks.
It is indeed expected that a perceptual evaluation will be performed on all the MUS results, and this requires availability of the separated tracks.
In case such a perceptual evaluation is performed, further instructions would be given to the participants, concerning the server and the way they should transmit these results for evaluation.
Meanwhile, please be so kind so as to keep these, exactly as they were evaluated with BSSeval, somewhere on your local hard drives.
If you have any problem, question, or comment, please, feel free to contact us.
Let’s separate! 🙂
The organizers for MUS,
Zafar Rafii (zafarrafii[at]gmail.com) and Antoine Liutkus (antoine.liutkus[at]inria.fr)