Please use this identifier to cite or link to this item: http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/5312
Title: Multi-window Comparison of Sir Performance in Extraction of Mono-aural Vocal and Non-vocal Components in Repet
Authors: Kher, Vansha
Lamba, T.S. [Guided by]
Keywords: MATLAB
Beat spectrum
Music separation
Music information retrieval
Issue Date: 2015
Publisher: Jaypee University of Information Technology, Solan, H.P.
Abstract: The vocalized form of human communication is speech. In linguistics (articulatory phonetics),normal human speech is said to be produced with pulmonary pressure that are created by the lungs , thereby creating phonation in the glottis and larynx , which is then modified by vocal-tract to generate different vowels and consonants. Speech is composed of following three parts: Articulation, Voice and Fluency. The message or information that gets communicated through speech is intrinsically of a discrete nature; i.e. it can be designated by a concatenation of elements from a finite set of symbols. Phonemes are the set of symbols from which every sound can be classified. Thus, phonemes are the basic units of language phonology, which are usually combined with other phonemes to form meaningful units called Morphemes. The Audio signals can be classified as the class of sounds that pursue same frequency as that of human auditory range. The separation of vocals and music has evolved as an extremely quintessential area to be resolved in Automatic Karaoke, vocalist identification and audio pre-processing. The distinction of the lead varying vocals from the background music in an audio recording is an extremely demanding and exigent task. The speech-separation research usually inculcates Time-frequency masking technique that ultimately appraises the hearing-aid design. The core principle in music which is capitalized to discriminate underlying non-vocals from vocals (speech) is Repetition. The rudimentary principle in the field of Music Information Retrieval (MIR) is ‘REPETITION’, as premise of music, as an art. The ‘Repetition’ feature is especially enacted for pop songs where the singer often overlays frequently changing vocals on a periodically repeating background in a mixture. The basic approach of dissertation is the recognisation of periodically repeating segments in audio excerpts, analogize them with a repeating model and finally discrimate the repeating musical patterns via Time-Frequency masking. A TF mask is grounded on the basis of TF representation of any signal. In this project, the quality of foreground vocals and accompanying background can be analyzed in terms of SIR (Signal to Interference Ratio) value utilizing ‘ANOVA’ (Analysis Of Variation) computational method on different genres of musical audios and formulated the complete comparison of SIR values using hamming , hanning and blackmann windows using the software tool ‘MATLAB’ and concluded that separation of mono-aural vocal and non-vocal components applying blackmann window shows better SIR values and separation quality as compared to hanning and hamming windows.
URI: http://ir.juit.ac.in:8080/jspui//xmlui/handle/123456789/5312
Appears in Collections:Dissertations (M.Tech.)



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.