This page is a supplementary material to our paper entitled “A musical similarity metric based on Symbolic Aggregate Approximation“, submitted for review to the 28th European Signal Processing Conference (EUSIPCO 2020). This project is a direct successor to our previous work, “Bach 2.0 – Generating Classical Music using Recurrent Neural Networks“, presented at the 23rd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2019), available as an open access article in Procedia Computer Science.
We chose the topic of synthetic classical music generation mainly because it seemed intuitive to treat music, and classical music in particular, as a form of natural language. The fact that there exists a standard musical notation system also comes in our aid. Through statistical analysis we have found an even stronger resemblance to natural language, namely the existence of sentences and phrases. This was done by means of finding recurring patterns, or motifs, in the composer’s corpus of works. A listing of motifs and frequency of apparition can be found here (observe the fact that the motifs take the relative pitch of consecutive notes). Furthermore, the choice of J. S. Bach for our experiments is that the famous composer is very well documented, with a lot of his works being freely available, and last but not least, he wrote a considerable amount of solo compositions. At the time of writing, we did not focus on polyphonic music.
We have improved upon the 3-layer GRU recurrent neural network by replacing the middle layer with a bi-directional LSTM layer (BD-LSTM), aiming to take into account not only the previous sequence of musical notes, but also future ones. We obtain significantly higher validation accuracy (categorical prediction accuracy), raising the bar to little over 90%, compared to the 75% previously reported. We employ 4 distinct such networks, each having the responsibility of a single attribute: type (note, chord, rest or bar), duration, position and optional modifiers. This greatly reduces the vocabulary size for the RNN, and brings the advantage that the final composition will be grammatically sound from NoteWorthy’s perspective. An instance of such a network looks similar to:
Next we wanted a measure of how much such a composition is similar to J. S. Bach’s style. We augmented the database of partita and sonata excerpts, now containing 61 works. The entire database, in NWCTXT format is available here. In order to find this similarity metric, we treat the sequence of notes as a signal composed of discrete values and employ techniques such as piece-wise aggregate approximation (PAA) and symbolic aggregate approximation (SAX). This helps with dimensionality reduction, whilst preserving signal features. At the end, a synthetic piece is compared against the database and the longest common substring is computed. This in turn paves the way for a generative adversarial approach.