Onsets detection is the primary step for other high-level music analysis. However, less research focuses on string instrument detection. Since the high demand for automatic system development, this project aims to addresses the string instrument onsets detection using a convolutional neural network (CNN). The project was programmed using Python, and Keras (Tensorflow). The network is built based on the dataset of more than 250 minutes of music files and 2,000 onsets annotation, aiming to fulﬁll the training, validation, and testing process.
The CNN network used MFCCs and Delta MFCCs as the two input channels. Then connected with two convolutional layers and one max-pooling layer to extract the feature maps. Then those maps were imported to the fully connected layer. The output was represented using two-hot-code: (1,0) is non-onset; (0,1) is onset.
model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape, data_format='channels_last')) model.add(Conv2D(64, (3, 3), activation='relu', input_shape=input_shape, data_format='channels_last')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.5)) model.add(Flatten()) model.add(Dense(32, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(num_classes, activation='softmax'))
Despite Keras has the build-in EarlyStopping function for validation, I replaced the valid criteria as F-measure for accurate detection. The stopping function was achieved by model saving and loading.
After optimizing the hyperparameters: MFCC frame size, MFCC filterbank number, Kernel number, batch size, and dropout, the highest F-measure was 0.86.
(The red line represents network detected onsets, the blue line represents onsets annotation)