Repository | Book | Chapter
(2009) Handbook of multimedia for digital entertainment and arts, Dordrecht, Springer.
In this chapter, we combine adaptive sampling in conjunction with video analogies (VA) to correct the audio stream in the karaoke environment (kappa= left {kappa (t) : kappa (t) = left (U(t), K(t) ight ), t in left ({t}_{s}, {t}_{e} ight ) ight }) where t s and t e are start time and end time respectively, U(t) is the user multimedia data. We employ multiple streams from the karaoke data (K(t) = left ({K}_{V }(t), {K}_{M}(t), {K}_{S}(t) ight )), where K V (t), K M (t) and K S (t) are the video, musical accompaniment and original singer's rendition respectively along with the user multimedia data (U(t) = left ({U}_{A}(t),{U}_{V }(t) ight )) where U V (t) is the user video captured with a camera and U A (t) is the user's rendition of the song. We analyze the audio and video streaming features (Psi (kappa ) = left {Psi (U(t), K(t)) ight } = left {Psi (U(t)), Psi (K(t)) ight } = left {{Psi }_{U}(t), {Psi }_{K}(t) ight }), to produce the corrected singing, namely output U ′ (t), which is made as close as possible to the original singer's rendition. Note that Ψ represents any kind of feature processing.
Publication details
DOI: 10.1007/978-0-387-89024-1_9
Full citation:
Yan, W. , Kankanhalli, M. S. (2009)., Cross-modal approach for karaoke artifacts correction, in B. Furht (ed.), Handbook of multimedia for digital entertainment and arts, Dordrecht, Springer, pp. 197-218.
This document is unfortunately not available for download at the moment.