Music retrieval algorithms for the lead sheet problem

Abstract Methods from string and pattern matching have recently been applied to many problems in music retrieval. We consider the so-called lead sheet problem, where symbolic representations of the harmony, melody, and, usually, bass line are presented separately. This is a common situation in some forms of popular music but is also significant for ‘classical’ music in many cases. A number of different but related musical situations are analysed and efficient algorithms are presented for music retrieval in each one.


Introduction
Recently a new approach to music analysis has been taken that attempts to take advantage of the tools of string and pattern matching from computer science. This has required new formalisations of many problems in music retrieval and in some cases a loss in the richness of the original musical source. As a result, arguably the most important innovations have been in increasing the sophistication of the way computational music analysis problems are represented. For example, where previously music was largely assumed to be monophonic, data structures and algorithms for polyphonic music are now commonplace. Furthermore, distinct representations for voiced and unvoiced polyphonic music allow more appropriate methods to be applied to each. For situations where pitch and rhythm alone are not sufficient, new methods for searching in sets of high dimensional data straddle the border between computational geometry and stringology. Perhaps even more importantly, formal translations of the concept of musical similarity have been extended away from their previous form which was largely inspired by text editing and bioinformatics (see , for recent surveys of string algorithmics applied to music analysis).
We focus here on music that can be considered in two separate but related parts. First, the melody is given as a monophonic sequence of notes. Accompanying it is the harmony presented as a sequence of chords; usually which note of each chord is the lowest (the bass line) is specified as well. The sequence of chords can either be given explicitly, as is usually the case in lead sheets for popular music, or it can be implicit, the usual case in a 'classical' piece, for example. The task of music retrieval using data of the form described is termed the lead sheet problem. We present four different variants on the lead sheet problem and give efficient algorithmic solutions for each.
Readers who are not familiar with basic concepts of music theory should consult a standard text such as Clough et al. (1999).

Definitions
A lead sheet generally accompanies the melody only with chord symbols that do not explicitly indicate the bass line; but, in a lead sheet (as distinct from traditional music notation), a chord symbol is usually taken as implying root position by default, thereby fixing the bass line. In Figure 1 -the first few measures of 'Alice in Wonderland' -the labels written under the stave such as 'Dmin7' and 'G7' are chord symbols. If the chords are in root position, the bass (lowest) note of the first chord is D and of the second chord, G.
A lead sheet describes the accompaniment rather abstractly: the chord symbols can be realised in notes in many ways. Figure 2 shows, on the lower staff, a very simple realisation -one simpler, in fact, than any competent performer would be likely to play. But the differences between this realisation and a realistic one are irrelevant for our purposes.
The subject of symbolic musical representation for use in computer applications has been discussed and debated extensively over the past 30 years, if not longer (see for example, Selfridge-Field, 1997;Brinkman, 1990;Howell et al., 1991;Marsden & Pople, 1992;Wiggins, 1993;Byrd & Isaacson, 2003). However, for the purposes of this paper, virtually any symbolic representation is adequate: all we need is (1) pitch as an attribute of notes, (2) ordering of notes, allowing simultaneous notes, and (3) a way to distinguish melody and harmony (one way being the expression of the latter in chord symbols).
For the rest of the paper, we consider almost exclusively traditional classical music notation; but all of the ideas apply to lead sheets, and many can readily be extended to other music representations.
We use a simple representation system for harmonic progression and encode all notes in terms of pitch classes. Each chord can be represented using the following notation: . Write 1; 5, 8, for example, for a chord in which the lowest note is PC 1; the chord also includes PC 5 and PC 8 in any order.
By using pitch classes we represent notes that are separated by an exact number of octaves in the same way. Therefore, there are a total of 12 possible pitch classes corresponding to the notes in the chromatic scale. Each pitch class is also only represented once per chord whether or not the some note occurs in different octaves. Figure 3 shows the final cadence of Chopin's Ballade no. 4 in f minor, Op. 52. The cadence ends with a sequence of four chords, each accompanying one note of the melody. Our representation separates the melody from the harmony and is given in Table 2. For comparison, our representation of the first five notes and their corresponding chords for the Alice in Wonderland excerpt appears in Table 1; note that this describes the lead-sheet notation version in Figure 1 and the conventional-music notation version in Figure 2 equally well.
Some basic definitions are required to allow us to formalise the musical representation. Following standard string matching terminology, we call any search query a pattern and any data set that is to be searched by the query, the text. Definition 1. We call a sequence of sets of pitch classes a set string. Throughout we say that the number of sets in a pattern set string p equals m and the number of sets in a text set string t equals n. Let p i (t i ) signify the ith set of the pattern (text). As the number of pitch classes, s, is bounded by a constant, the total number of pitch classes in the pattern (text) is no more than a constant times larger than m(n).
Definition 2. We say that a set string p occurs in set string t if 9j : 1 j n 7 m þ 1 such that 8i: 1 i m p i t i þ j71 . In other words, all the notes in chords in p occur in the corresponding chords of t but there may be chords in t that have 'extra' notes which do not occur in p.
Definition 3. Consider a set string t with a subset of each set t j identified as the lead set. We call such a sequence a    Table 2. A representation of the last four chords of Figure 3.
lead/non-lead set string which we abbreviate to LN set string.
Definition 4. Consider an LN string t. Assume there is some ( possibly different) total ordering applied to the pitch classes of each of the lead sets of t. We call such a sequence an ordered lead/non-lead set string which we abbreviate to OLN set string.
Definition 5. For any LN or OLN string t, consider the unique set string l with the property that 8j: 1 j n l j ¼ the lead set of t j . We call such a set string the (ordered) lead set string of t. Consider also the unique set string nl with the property that 8 j: 1 j n nl j ¼ t j \l j . We call such a set string the (ordered) non-lead set string of t. We write t ¼ (l, nl ) when we need to refer to the lead and non-lead set strings of t.
Definition 6. Consider a pattern p and a text t and an order preserving function f : {1, . . . , m} ! { j, . . . , n}. If p i ¼ t f{i} for all 1 i m then we call f an alignment between p and t[ j, . . . ,n]. If furthermore max i ( f (i þ 1)f(i)) a then we say that f is an a-alignment. Where p i and t f(i) are ordered sets then we require both that p i ¼ t f{i} and that the ordering of their elements is the same.

Problems and solutions
Given a musical pattern and text expressed in terms of a melody and corresponding harmony, the general task is to find all positions in the text where there is a match.
The exact definition of a match and the type of data that is used as input will determine the algorithm that we propose. The patterns and texts described below are instances of the LN and OLN set strings introduced in Definitions 3 and 4. Although we do not give the detail, the dynamic programming solutions presented can be simply modified to allow 'wrong notes' both in the pattern and text without increasing the time complexity of their respective algorithms. We describe four main problems.

LN string matching
The input pattern and text are split into lead set strings, which correspond to the melody, and non-lead set strings, which correspond to the harmony. The task is to find all positions in the text where both of the following conditions are satisfied: 1. There is an exact match of the melody of the pattern and the melody of text; and 2. The harmony of the pattern is included in the harmony of the text.
For our purposes we define harmonic inclusion to require that the pitch classes at each position of the pattern are a subset of those in the corresponding position of the text. The problem is expressed more formally as follows: Problem 3.1. Consider LN pattern p ¼ ( pl, pnl ) and LN text t ¼ (tl, tnl ) which we call the text. Find all positions j that satisfy the following conditions: The special case where the lead sets only have one member each is an instance of this problem. This corresponds to the situation where either the pattern or text (or both) is monophonic.

Solution
We first encode pl, pnl, tl and tnl as strings of bit-vectors called pl 0 , pnl 0 , tl 0 and tnl 0 respectively. Each bit-vector represents a set of pitch classes and is defined as follows: . Let the nth bit of the bit-vector be set to 1 if the pitch class n is in the set. Set all remaining bits to 0.
It is clear that each bit-vector has length s bits. The following steps are sufficient to solve Problem 3.1:

Construct an array A with the property that
Let a be the number of 1s in the array A. Step 1 takes O(sn) time using standard exact string matching techniques (for example, Boyer & Moore, 1977;Knuth et al., 1977;Galil, 1979). The time required in Step 2 for each j is O(sm), as we can check each set in pnl 0 in turn (pnl 0 [i] has at most s elements). The total running time of Step 2 is therefore O(sam). This is because we can restrict ourselves to only checking every position j of tnl 0 for which A[j] ¼ 1. The overall method is shown in Algorithm 3.2.
The position of the 1s in array B give the solution that is required. The overall time complexity is O(snm) 1 as in the worst case a ¼ n. However, in our case s is a constant 1 Recall that the total size of the input data is in fact O(s (nþm)) not O(n þ m). This is because n and m are the number of sets in the input and each set can have s elements. Therefore, the complexity O(snm) is in fact better than the O(s 2 nm) one would have expected if the sizes of the input set strings had simply been multiplied. and in real musical data a is likely to be very small. This means that the running time will likely be closer to linear time in practice (see Section 4).

Algorithm 3.2. LN matching ( p, t)
" Input: pattern and text both LN strings " Output: All locations where pattern occurs in text Begin lmatch Boyer -Moore( pl, tl ) for i in lmatch do match[i] ¼ ISSUBSET( pnl, tnl, i) End

OLN string matching
The input pattern and text are split into lead and nonlead sets as before. However, in this case we consider that the order of the notes in the lead sets has to be preserved for there to be a match of the melody. and tl [ j þ i 7 1] is the same.

Solution
As the lead sets are ordered they must be encoded differently to the non-lead sets. We encode the data as follows: . Encode pnl and tnl as strings of bit-vectors called pnl 0 and tnl 0 respectively, using the same encoding as in Problem 3.1. . As the lead sets are ordered, each element in a particular set will have a rank that corresponds to its position in the ordering. For example, in the ordered set 7, 1, 4, the rank of 7 is 1, the rank of 1 is 2 and the rank of 4 is 3. Let rank(e) be the rank of element e (with respect to a particular ordered set). We encode each element e in each ordered set as an integer e 0 ¼ s(rank(e) -1) þ e. For example, the ordered set 7, 1, 4 would be transformed to {7, 13, 28} if s ¼ 12.
This encoding is always unique as e 2 1, . . . , s. Let pl 0 be a string of integers consisting of the encoded elements of each set pl [i] in order. We define tl 0 in the same way as a string of encoded elements from tl.
Steps 2 combined with Step 3 can be solved by applying a linear time pattern matching algorithm to the expanded arrays pl 0 and tl 0 . The total length of tl 0 is sn and so the total time required for these two steps is O(sn). The running time of Step 1 is O(sam), if we restrict ourselves to checking every position j of tnl 0 for which there is match in Step 2. This is O(snm) in the worst case but will be closer to linear time in practice (see Section 4).

LN matching with a-bounded gaps
Matches of the melody and harmony in the pattern and text may be obscured by the presence of extra notes or chords in the text. In this formulation we allow gaps to be inserted into the pattern when attempting to find a match. In order to ensure that the match still has musical relevance the size of the gap is bounded by an integer a (see Definition 6). Gapped alignment algorithms are well studied, in particular for the alignment of DNA sequences in bioinformatics (see Gusfield, 1997, for example). The fastest known solutions run in O(nm) time but are designed for simple strings of characters. We extend these methods for our application to LN (and later OLN) set strings.
Problem 3.4. Consider an LN pattern p ¼ ( pl, pnl ), an LN text t ¼ (tl, tnl ) and an integer bound a. Find all positions j that satisfy the following conditions: 1. There is an a-alignment f between pl and tl [ j, . . . ,n]; 2. 8i: In other words, we want to find matches between p and t allowing gaps of size up to a in the alignment.

Solution
The basic idea of the algorithm is to compute a-alignments of increasing prefixes of pattern p in text t. This is achieved by dynamic programming using a table D with m rows and n columns. The value at D i, j contains the last index in t that p 1 , . . . , p i has successfully been aligned with or 0 if the gap to the last successful alignment is larger than a. We define p i ffi t j to mean that p i matches t j .
; jÀ1 a þ 1 and D iÀ1; jÀ1 > 0 D i; jÀ1 if p i 6 ffi t j and j À D iÀ1;jÀ1 < a þ 1 D i; jÀ1 if p i ffi t j ; j À D iÀ1; jÀ1 > a þ 1 and j À D i; jÀ1 < a þ 1 0 otherwise: Boundary conditions for the matrix D are as follows: D 0;0 ¼ 1; D i;0 ¼ 0 and D 0; j ¼ j: The second and third boundary conditions reflect the notion that nothing aligns with the empty string but that the empty string aligns with everything. The first boundary condition is simply by definition. Locations in t for which there is an LN match with p with a-bounded gaps will correspond to the entries in matrix D where D m,j ¼ j. These can be found by inspecting the final row of D once it is completed. 2 Algorithm 3.5 gives an overview of the method.
At every entry in matrix D it may be necessary to check if p i matches t j and also some previous value in the D. The time required for this is O(s) and so the total time required to construct D is O(snm). As we regard s to be a constant, the overall computation time is O(nm).
Algorithm 3.5. LN matching with a-bounded gaps ( p, t) " Input: pattern and text both LN strings " Output: All locations where pattern occurs in text with gaps bounded by a Begin Set boundary conditions for matrix D for j in {1, . . . ,n} do for i in {1, . . . ,m} do Update entry D i,j following rules above od od Find all entries such that D m, j ¼ j End

OLN matching with a-bounded gaps
If the order of notes in the lead sets has to be preserved for there to be a match then we can formulate our final pattern matching problem. A match is required between the pattern and text with gap size bounded by a. A further requirement is that the order of notes in the corresponding matching lead sets must be the same.
Problem 3.6. Consider an OLN pattern p, an OLN text t and a bound a. Find all positions j that satisfy the following conditions: 1. There is an ordered a-alignment f between pl and tl [ j, . . . ,n]; 2. 8i: Note that condition 1 requires that the ordering imposed on pl[i] and tl[i] be the same.

Solution
The method of solution is the same as that for Problem 3.4. We only need to modify the definition of a match between p i and t j so that the order of the elements in the lead sets are taken into account. The recursion for computing matrix D is defined in the same way and so the overall running time is O(nm) as before. The space requirement can also be reduced to O(n þ m) if required as explained in Footnote 2.

Implementation and experimental results
Each of the four algorithms was implemented in C and compiled using gcc 3.3.2 with the -O2 optimisation flag.
The tests were then run on a 2.40 GHz Pentium 4 processor with 512 MB of RAM. Random texts and patterns of different lengths were created using the method described in Section 4.1. Each experiment was repeated 10 times and the average of the running times calculated. The timings given are for the search algorithms only and do not include the time required to create the data.

Creation of test data
In order to test the different algorithms, random input data is used. Although this form of data is not realistic in a musical sense we expect that the running times given are an accurate indication of what can be expected in practice.

Text for LN matching with and without a-bounded gaps
The lead sets are chosen uniformly and independently at random from the space of possible sets with alphabet size 12. Non-lead set creation are also chosen at random but any elements that are in common with its corresponding lead set are then removed. The result is that there are no elements in common in a lead/non-lead set pair.

Text for OLN matching with and without a-bounded gaps
The ordered lead sets are created by the following algorithm: 1. Uniformly sample a random set s from the space of possible sets with alphabet size 12; 2. Uniformly sample a random permutation of the set s.
The result is an ordered set of random size. The lead sets are chosen independently using this scheme. The nonlead sets are chosen in the same way as for LN matching.
2 To find the full alignments, as opposed to only the locations in the text where alignments finish, it is necessary to perform a trace-back by reversing the direction of the rules for creating D.
The space requirement can also be reduced to O(n þ m) by the application of a divide-and-conquer method due to Hirschberg (1975).

Pattern for OLN/LN matching
A random position in the text is chosen and an OLN/LN string of the appropriate length is copied and used as the pattern.

Pattern for OLN/LN matching with a-bounded gaps
A random position in the text is chosen as before. Then the following algorithm is used: 1. Copy current lead and non-lead set from text and add to pattern. 2. Skip l positions in the text. l is chosen uniformly at random from the range [0, . . . ,a]. 3. Loop until the pattern has length m.

Running times
Implementations of LN and OLN matching were tested and the results shown in Figures 4 and 5. Pattern lengths of 10, 100 and 1000 sets were used. The running times are practically linear as discussed in Sections 3.1 and 3.2. The Boyer -Moore algorithm, which is typically faster for longer patterns, was implemented for the linear search step. This is the reason for the speedup that can be seen as the pattern length is increased. The limit on the size of the input in each case was the size of available RAM. It is important to note that OLN matching requires considerably more RAM than LN matching as the lead sets must be stored explicitly as arrays of integers rather than as bit-vectors. This is reflected in the maximum input sizes tested for each. The overall running time for both experiments was always less than 0.2 seconds. Figures 6 and 7 show the running times for LN and OLN a-bounded matching, respectively. Pattern lengths Fig. 4. Running times for LN matching using patterns of different lengths.  of 10, 100, 1000 were tested. The running times of these dynamic programming algorithms do not vary with a (this was also confirmed empirically). The results shown are for a ¼ 2.
The results show that in practice available RAM and not computational complexity is the limiting factor the size of the input that can be processed. As the dynamic programming method that was implemented has O(nm) space complexity, increasing the pattern length correspondingly decreases the maximum text size that can be searched. An implementation utilising Hirschberg's O(nþm) divide-and-conquer approach would allow much larger databases to be searched at the cost of roughly halving the search speed. As the size of musical databases increases the need for such space saving techniques will undoubtedly become more prominent.

Conclusion
Four new algorithms have been given for music retrieval in data where the melody and harmony are presented separately. Each is algorithmically efficient and shown to be very fast in practice, taking at most a few seconds to search the largest data set. An exciting open problem that would greatly enhance this work is to consider more musically sophisticated concepts of approximation, especially for comparing harmonies. For searching very large databases that will become available in the future, faster algorithms with improved worst case time complexity may also be required.