You are on page 1of 5

NOTE IDENTIFICATION IN CARNATIC MUSIC FROM FREQUENCY SPECTRUM

Prashanth T R, Radhika Venugopalan


Computer Science and Engineering Department, National Institute of Technology Trichy, Tamil Nadu, India-620015
trprashanth07@yahoo.com

raddy06@gmail.com
Abstract- Raga refers to melodic modes used in Indian Classical Music. Its a series of five or more musical notes upon which a melody is made. Note identification forms the basis for Raga Identification in Music. Characterisation of a note in Carnatic Music is challenging due to the extensive use of Gamakas. In this paper, we propose a System that takes a wav file as input, analyses the frequency characteristics and performs note mapping. The prominent notes in the Raga are selected by a Statistical T Test based on the duration of occurrence of a note. A Test data of around 15 raga alapanas was used. The renditions ranged from 3 8 minute clips of various artistes, both male and female. The System performs with an accuracy of 90%. Keywords Auto Correlation, Octave Band-Pass Filter, Onset and Offset, Hill Peak Heuristic, T Test The black and white notes discussed in this paper are with respect to the C-major scale. In a raga, whenever a black note is preceded by a white note, the black note is oscillated from the fret of the previous white note, except in the case of prathi madhyamam(M2) which is oscillated from panchamam(P). The gamakas on the white note, in this case, are played by plucking the fret of the same note. When a white note is preceded by another note of the same kind, it can either be played from the preceding note or it can be oscillated on the same fret. This is a general way of playing Gamakas, although the diversity of Carnatic Music leaves room for a lot more variations in characterising them. A detailed explanation of Gamakas with respect to Music modelling can be found in [4]. In this paper, we examine the constraints imposed on characterization of a note due to Gamakas, and propose a heuristic algorithm to circumvent the same, for the purpose of Note identification. The study is entirely based on Frequency Spectrum analysis and explores the wide extent to which it reflects the dominance of Gamakas in Carnatic Music. The paper is structured as follows. A review of related work is given in section II. Section III describes various stages in the building of the system. Section IV discusses the results of the experiment followed by Section V that summarises the Conclusions and notable inferences drawn from the experiment for future work. II. RELATED WORK

I.

INTRODUCTION

A raga is a characteristic arrangement or progression of notes whose full potential and complexity can only be realised in exposition. It is characterised by several attributes, the choice and the sequence of the notes from the possible 12 notes in an octave, Arohana and Avarohana [6], characteristic Gamakas and special phrases (prayogams [6]) that uniquely identify it. Carnatic Music is unique in the respect of Gamakas [6]. Gamakas are not optional embellishments but are integral to the characterisation of a note in a Raga. Two Ragas with the same notes get their difference from the different Gamakas employed in playing/singing them. TABLE I RELATIVE RATIOS OF TWELVE SWARA STHANAS SWARA S R1 R2 G2 G3 M1 KEY C C# D D# E F SWARA M2 P D1 D2 N2 N3 KEY F# G G# A A# B

South Indian Classical Music has remained relatively unexplored as regards the field of Computational Musicology, due to its complicated grammar that gives quite a leverage for musicians to perform, so much so that, no two renditions of the same raga by the same artiste need be similar. The authors of [1] propose a pattern identification system based on Kalman filters, which was demonstrated on a selected sextet of ragas. In [2], Arvind Krishnaswamy presents an detailed pitch contour of the different variations of a note, giving a graphical representation of Gamakas and Prayogams. In [3], the authors propose a system to identify the raga of a Krithi. Here, the onset and offset of a note is determined from changes in Spectral Energy. In [5] again, the same authors characterise the note onsets by first level segmentation of the Tala of the song. Keeping tala as the basis restricts the method to Krithis alone. In [7], classification of music database based on the Melakartha Scheme (Parent raga) is attempted. The author identifies notes from the Power Spectral Density of the audio data.

III. FLOW DIAGRAM

the Thara sthayi Panchamam. The frequencies which do not lie in this range are considered Outliers. Mathematically, outlier is a member of a data set which is different in some way from the general pattern and may indicate that there was an error in the process which produced the data. The audio samples considered are live audio recorded and subject to noise and hiss interference and result in abnormal frequencies which are outliers. This preprocessing step aims to remove all such noise frequencies. C. Identify Note Boundaries:

Fig 1 Flow Diagram The flow diagram of the system consists of 3 stages- Input, Process and Output explained below. 1. Input:

The sequences of notes for different Ragas are very well defined. A note is sung plainly or with Gamakas. The Sangeetha Sampradaya Pradarshini [11] broadly classifies the gamakas as Jaru/Ullasita (slides), Gamaka(Deflections) and Janta(Fingered Stresses). When a note is held flat, the resulting pitch graph is a horizontal line, and with Gamakas, it is a curve with distinct peaks. The occurrence of a note is observed to be around the peak regions in the pitch graph, this is the Hill Peak Heuristic[13]. Peaks in a graph are characterised by slope at the point being zero. Given a pitch-time sample with time points t1 , t2 , ..., ti1 , ti , ti+1 , ..., tn and corresponding pitch values,p1 , p2 , ..., pi1 , pi , pi+1 , ..., pn, the one point slope of the pitch values are calculated using the formula: Slope = { p / t }

The music clip is stored as 16 - bit mono channel wav file, sampled at 11.025 KHz. 2. Process

The Processing part is divided into 3 stages as shown in Fig 1. A. Pitch List: Fig 2 Ga of Ganamurthy held relatively flat, Phrase-MDPMG

The input is fed into the audio processing software Praat [10]. A Hanning window of physical length 0.06 s is used for pitch estimation. The author improves upon the Auto-Correlation algorithm to find the fundamental frequency. The modification suggested is that the autocorrelation function of the windowed signal is divided by the autocorrelation function of the window.

rx () rxw () / rw ()
The time-step is set to 0.01 seconds, since it has been observed that the minimum duration a note occurs is 10ms. The upper and lower bound of frequency values are specified as 50-500Hz. The Octave and Octave jump cost are set as 0.14 and 0.55 for optimum continuity in pitch. These parameters are used by Praat to extract Frequency-Time values for the entire audio sample using the Auto Correlation algorithm [12]. B. Pre-process: Step 1: If slope<1 select neighboring frequencies with slope<2 Step 2: If STDEV(slopes of selected region)<2 find mean frequency of selected regions Step 3: Map onto a note Step 4: Store time boundary of selected region as onset and offset of the note

Fig 3 Ga of Varali sung with Gamaka The calculated slope values are traversed sequentially and the note boundaries are identified by a heuristic algorithm as below:

The Tonic or frequency of the Adhara Shadjam is manually found at this stage by selecting the portions of music where Shadjam clearly occurs, and querying Praat for the pitch value. In a typical Alapana or Krithi, the entire music is confined to a span of 2 octaves ranging from the Mandra sthayi Panchamam to

Slopes less than one are typically regions near a peak, Due to the innate imperfections of the human voice, which are also responsible for its profound uniqueness, and the nature of the Music to not favor flat intonations, hitting a note sharply is almost unobserved. Standard deviation is counted in as a criterion since it reflects the rate of change in slope. A lesser value implies a tendency to stabilize in the region. Hence, the region is identified to possibly contain a note. Employing this algorithm over the entire duration of the sample gives the onset and offset of the notes. Table I contains the relative ratios of the twelve notes. The mean frequency of the regions selected by the heuristic algorithm is mapped on to the corresponding note if it lies within an experimentally determined range around the flat frequency of the note. Here, wed like to mention that as suggested in [4], the mean of a Gamaka doesnt necessarily characterise the note. It could even be semi-tones above or below a note. This is why we allow for a range around a note. This is the closest we could come to mathematically modelling Gamakas. The time of occurrence of every single note in the entire file, calculated from the identified onset and offset, is cumulatively stored in a time array. This is representative of the presence of a note in the Raga and hence used for resolution purposes. The heuristic suggested for the selection of a region attempts to counter the complications introduced by the unrestricted, rather inevitable usage of Gamakas in Carnatic Music. TABLE II RELATIVE RATIOS OF TWELVE SWARA STHANAS Swara S R1 R2 G2 G3 M1 Ratio 1 16/15 10/9 6/5 5/4 4/3 Swara M2 P D1 D2 N2 N3 Ratio 27/20 3/2 8/5 5/3 16/9 15/8

mean 0 and unknown variance, against the alternative that the mean is not 0 is performed. The T-test function returns a Boolean value 1 indicating a rejection of the null hypothesis and a Boolean value 0 indicating a failure to reject the null hypothesis, at the chosen alpha (risk) level. The entire sample is divided into segments based on the length of the clip, and the T- Statistics for the 12 notes are obtained for each segment. Segmentation is done to avoid normalisation over the entire length of the clip. The time array values of the two variations of the 5 notes R, G, M, D, and N are sent as parameters to the T Test procedure of Matlab, to calculate the one tailed T-value T-value= {absolute difference between the group means / variability of groups} The T-value obtained is compared with standard T-table value using N-1 degrees of freedom, where N is number of values of single variation of a note using an alpha level varying between 0.05 and 0.1. The notes are resolved if the T-value computed is 1 and not resolved if otherwise. The T Test results of all the segments are consolidated and presence of notes in the raga is concluded based on the majority. For example, if R1 is found to be dominantly occurring in 3 out of 5 segments, it is concluded to be present in the Raga. This is quite along expected lines because in any Raga exposition, the artiste slowly builds the melody and since we divide the entire clip into segments, in a single segment all the notes of the raga wouldnt be occurring. IV. OBSERVATIONS AND RESULTS The proposed system was tested on wav files of raga alapanas, one of which is discussed below. A. Varali S Ri1 Ga1 Ma2 P Da1 Ni3 S; S Ni3 Da1 P Ma2 Ga1 Ri1 S Raga Varali is a melakartha taking Shadjam(C), Suddha Rishabam(C#), Suddha Gandharam(D#), Prati Madhyamam(F#), Panchamam(G), Suddha Dhaivatham(G#) and Kakali Nishadam(A#). Varali is characterised by inflexions around Ga and Ma. The Gandharam in the raga takes different forms conditioned by its relationship with the subsequent swaras. Prominently, Rishabam is intoned in place of the Gandharam (Suddha Gandharam) T Test was performed on a 450s Varali Alapana rendered by Shri.TM Krishna, using Tonic (Frequency of the Adhara Shadjam) as 146Hz and Alpha level as 0.01. The results presented in Table III strongly suggest the presence of D#, F#, G, G# and A#. Rishabam(C#) being a black note is anchored onto Shadjam(C) for the most part. This results in minor occurrence of both variations of Rishabam. Consequently, the note is not resolved.

3.

Output:

T-Test is used to identify the notes present in the Raga. This is matched with the 72 Melakartha Table to identify the scale. A. T-Test to resolve notes:

The time array values of the twelve notes are converted into a fraction of the total time of occurrence of all the notes in the entire file. These fractions are the relative T-statistics. The five notes Rishabam(R), Gandharam(G), Madhyamam(M), Dhaivatham(D) and Nishadham(N) have two variations each. Paired T-Test is performed taking the two variations of each note to resolve between them. The T-Test assesses whether the average of the two groups are statistically different from one another. The T-value is computed and compared with table of significance to test whether the ratio is large enough to claim a significant difference between the two groups. A paired t-test of the null hypothesis that data in the difference x-y are a random sample from a normal distribution with

TABLE III NOTES RESOLVED BY T TEST - VARALI NOTES ACTUAL NOTE SUDDHA SADHARANA PRATHI SUDDHA KAKALI NOTE IDENTIFIED UNRESOLVED SADHARANA PRATHI SUDDHA KAKALI

TABLE V RAGAS WITH SINGLE NOTE DISCREPANCY RAGA VARALI CHARUKESI HEMAVATHI NATABHAIRAVI NO OF SAMPLES 5 3 4 4 EFFICIENCY 66% 90% 85% 80%

RISHABAM GANDHARAM MADHYAMAM DHAIVATHAM NISHADHAM

Table IV presents the results of the test conducted on 35 different samples. Table V lists the statistics of test performed on samples of the same raga sung by different artistes.

V.

CONCLUSIONS AND FUTURE WORK

We have tested the proposed model for raga alapanas sung by different singers. The heuristics for note duration have to be modified according to the tala when working on Krithis. Methods to filter out percussion instruments by timbre analysis have to be incorporated. A rigorous study of the gamakas possible on each swara must be done for a more efficient note transcription. The tonic (Adhara Shadjam) was manually found. Instead we need to develop an algorithm for it. At present, identification of the Scale of the Raga has been successful. However, Trained Neural Networks, or any of the other existing classifiers, need to be incorporated into the system to accurately identify the Raga. ACKNOWLEDGEMENTS We sincerely thank Dr. V.Gopalakrishnan for his tremendous support and guidance throughout our work. We would also like to thank Mr. M. Subramaniam for sharing invaluable information about the scientific and musical aspects of Carnatic Music, supplemented with his own ideas and inferences that have often redefined our approach towards the problem. We express our gratitude to all the people who contributed in any way at different stages of this research. REFERENCES [1] M. S. Sinith, K.Rajeev Hidden Markov Model based Musical Pattern recognition for South Indian Classical Music, ICSIP Arvind Krishnaswamy Melodic atoms for transcribing Carnatic Music, CCRMA, Department of Electrical Engineering, Stanford University. Rajeswari Sridhar and T V Geetha, Raga Identification of Carnatic Music for Music Information Retrieval, IJRTE, May 2009

Fig 4. Frequency Time plot of Varali 205 210s

[2] Fig 5. Note mapped plot of Varali 205 210s Notation PMPDNDPM TABLE IV RESULTS % ACCURACY 100 70-90 40-60 10-30 0 Total NO OF SAMPLES 14 15 3 1 1 35 [3]

[4] Arvind Krishnaswamy, Pitch Measurement versus Perception of Indian Classical Music, Proceedings of SMAC, August 2003 [5] Rajeswari Sridhar and T V Geetha, Swara identification for South Indian Classical Music, ICIT 06.

[6]

Prof Sambamurthy, South Indian Music vol 4, The Indian Music Publishing House, Madras 1982

[7] Vijayaditya Peddinti, et al A Scheme for Search space reduction in CBMIR using Melakartha Raga Classification Scheme [8] M.Subramanian, "Analysis of Gamakams of Carnatic Music using the Computer", Sangeet Natak, Vol.XXXVII, Number 1, 2002, pp 26-47. [9] P. Boersma, D. Weenik(2010)Praat: doing phonetics by computer[Computer program]. Version 5.2.03, retrieved 19 November 2010 from http://www.praat.org/ [10] William M K Trochim (2006) Web Centre for Social Research Methods [Online]. Available: http://www.socialresearchmethods.net/kb/stat_t.php [11] Sangeetha Sampradaya Pradarshini, Subbarama Diskhithar(Tamil Translation by B.Rajam Ayyar, S. Ramanathan), Published by Music Academy Chennai, 1977 [12] Paul Boersma, Accurate Short-term analysis of the Fundamental Frequency and the Harmonics-to-Noise Ratio of a sampled sound, Institute of Phonetic Sciences, University of Amsterdam, Proceedings 17 (1993), 97-110. [13] Gaurav P, Chaitanya M, Paul I, Tansen: A System for automatic Raga Identification, Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur.