ACM Multimedia 2012 Grand Challenge: Music Video Generation

Sep 18, 20131 like256 views

These slides present a novel content-based system that utilizes the perceived emotion of multimedia content as a bridge to connect music and video. Specifically, we propose a novel machine learning framework, called Acousticvisual Emotion Gaussians (AVEG), to jointly learn the tripartite relationship among music, video, and emotion from an emotion-annotated corpus of music videos. For a music piece (or a video sequence), the AVEG model is applied to predict its emotion distribution in a stochastic emotion space from the corresponding low-level acoustic (resp. visual) features. Finally, music and video are matched by measuring the similarity between the two corresponding emotion distributions, based on a distance measure such as KL divergence.

1
The Audiovisual Emotion
Gaussians Model for Automatic
Generation of Music Video
Ju-Chiang Wang, Yi-Hsuan Yang,
I-Hong Jhuo, Yen-Yu Lin, Hsin-Min Wang
Academia Sinica, Taiwan

2
Introduction
• Generate the music video based on the emotion
content recognized by machine
• The novel Audiovisual Emotion Gaussians
(AVEG) framework, learns the tripartie
relationship among music, video, and emotion
• Project music pieces and video sequences into
the multi-dimensional emotion space (3DES),
and perform the cross-modal matching via the
predicted emotion distributions

3
System Diagram
• Utilize the DEAP dataset (valence, activation, and potency)
– 3D Emotion annotated music videos
• Extend the AEG model to handle video (VEG)
– Wang et al. (2012), “The Acoustic Emotion Gaussians model for
emotion-based music annotation and retrieval,” Proc. ACM MM
(full paper)

4
Preliminary Result
• Perform the cross-modal retrieval experiment on
the 120 music and video clips of DEAP
• Evaluate the NDCG@P for the ranking
• Measure the average Top 1 Relevance Score
Scenario P=5 P=10 P=15 P=20
Audio to Video Ranking 0.8748 0.8316 0.8221 0.8172
Video to Audio Ranking 0.8737 0.8204 0.8105 0.8073
Random Permutation 0.8035 0.7604 0.7441 0.7370
Scenario A to V V to A Random
Average Relevance 0.4881 0.4826 0.3837

El documento discute tres problemas principales con la política de TIC en Colombia: 1) La falta de comprensión sobre cómo usar la tecnología entre docentes y estudiantes, especialmente en comunidades étnicas. 2) La política no considera que muchos docentes no tienen acceso a computadoras u otros dispositivos. 3) Hay una falta de inversión en educación debido a los altos gastos militares y de construcción, lo que limita los recursos para las escuelas.

Presentation2KatiePearce

��

The film industry process involves several stages: 1) Development where a story is selected and developed through treatments, outlines, and screenplays. 2) Pre-production where the production is planned through budgeting, hiring crew, and visualizing the film. 3) Production where filming takes place over scheduled shooting days following standard procedures. 4) Post-production where the film is assembled and edited. 5) Distribution and marketing where the completed film is released to theaters and other platforms. Independent filmmaking also occurs outside the major studio system using newer technologies and distribution models.

Teoría Discursova- tipos de párrafos Laura Cardona

��

The document discusses different types of paragraphs used in writing. It identifies 18 types of paragraphs including argumentative, conceptual, chronological, enumeration, description, explanatory, presentation, persuasive, introductory, expository, problem-solution, sequence, transition, opening, narrative, descriptive, comparative, and cause-effect paragraphs. It also distinguishes between functional paragraphs that guide exposure of thought and informational paragraphs that support and develop the content of a text.

Publish Once, Brand Everywherelaurendreier

��

ParagraphsLaura Cardona

��

There are many types of paragraphs that serve different purposes. Some common types include argumentative paragraphs that seek to convince the reader of something, descriptive paragraphs that describe people, places or things in detail, and explanatory paragraphs that explain a subject thoroughly to teach or inform readers. Paragraphs can also be introductory, concluding, narrative, comparative, deductive, inductive and more. Well-structured paragraphs generally include a topic sentence expressing the main idea supported by additional details and examples.

Locationshazelbaz

��

Politica publica de ticJavier Perdomo

��

El documento discute tres problemas con la política de TIC en Colombia: 1) La falta de comprensión sobre cómo usar la tecnología entre docentes y estudiantes, especialmente en comunidades étnicas. 2) La política no considera que los docentes a menudo no tienen acceso a computadoras u otros equipos. 3) Hay poca inversión en educación debido a los altos gastos militares y de política, lo que limita los recursos para las escuelas.

Say you like mehazelbaz

��

This document summarizes several literary theories and how they apply to the narrative presented in a music video: Propp's character archetypes of hero, damsel in distress, and villain are present. Strauss's concept of binary oppositions like good vs. evil is also evident. Barthes identified several codes - the video demonstrates the enigma code, action code, semantic code, symbolic code, and cultural code. Finally, Todorov's 5 stages of narrative are shown as the original equilibrium, disruption, recognition, attempt to restore, and new equilibrium.

The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...Ju-Chiang Wang

��

One of the most exciting but challenging endeavors in music research is to develop a computational model that comprehends the affective content of music signals and organizes a music collection according to emotion. In this paper, we propose a novel \emph{acoustic emotion Gaussians} (AEG) model that defines a proper generative process of emotion perception in music. As a generative model, AEG permits easy and straightforward interpretations of the model learning processes. To bridge the acoustic feature space and music emotion space, a set of \emph{latent feature classes}, which are learned from data, is introduced to perform the end-to-end semantic mappings between the two spaces. Based on the space of latent feature classes, the AEG model is applicable to both automatic music emotion annotation and emotion-based music retrieval. To gain insights into the AEG model, we also provide illustrations of the model learning process. A comprehensive performance study is conducted to demonstrate the superior accuracy of AEG over its predecessors, using two emotion annotated music corpora MER60 and MTurk. Our results show that the AEG model outperforms the state-of-the-art methods in automatic music emotion annotation. Moreover, for the first time a quantitative evaluation of emotion-based music retrieval is reported.

Exploring the Relationship Between Multi-Modal Emotion Semantics of MusicJu-Chiang Wang

��

Computational modeling of music emotion has been addressed primarily by two approaches: the categorical approach that categorizes emotions into mood classes and the dimensional approach that regards emotions as numerical values over a few dimensions such as valence and activation. Being two extreme scenarios (discrete/continuous), the two approaches actually share a unified goal of understanding the emotion semantics of music. This paper presents the first computational model that unifies the two semantic modalities under a probabilistic framework, which makes it possible to explore the relationship between them in a computational way. With the proposed framework, mood labels can be mapped into the emotion space in an unsupervised and content-based manner, without any training ground truth annotations for the semantic mapping. Such a function can be applied to automatically generate a semantically structured tag cloud in the emotion space. To demonstrate the effectiveness of the proposed framework, we qualitatively evaluate the mood tag clouds generated from two emotion-annotated corpora, and quantitatively evaluate the accuracy of the categorical-dimensional mapping by comparing the results with those created by psychologists, including the one proposed by Whissell & Plutchik and the one defined in the Affective Norms for English Words (ANEW).

Automatic Set List Identification and Song Segmentation of Full-Length Concer...Ju-Chiang Wang

��

Recently, plenty of full-length concert videos have become available on video-sharing websites such as YouTube. As each video generally contains multiple songs, natural questions that arise include “what is the set list?” and “when does each song begin and end?” Indeed, many full concert videos on YouTube contain song lists and timecodes contributed by uploaders and viewers. However, newly uploaded content and videos of lesser-known artists typically lack this metadata. Manually labeling such metadata would be labor-intensive, and thus an automated solution is desirable. In this paper, we define a novel research problem, automatic set list segmentation of full concert videos, which calls for techniques in music information retrieval (MIR) such as audio fingerprinting, cover song identification, musical event detection, music alignment, and structural segmentation. Moreover, we propose a greedy approach that sequentially identifies a song from a database of studio versions and simultaneously estimates its probable boundaries in the concert. We conduct preliminary evaluations on a collection of 20 full concerts and 1,152 studio tracks. Our result demonstrates the effectiveness of the proposed greedy algorithm.

Personalized Music Emotion Recognition via Model AdaptationJu-Chiang Wang

��

1. The document presents a method for personalized music emotion recognition via model adaptation. It develops a probabilistic Acoustic Emotion Gaussians (AEG) model to represent emotions in music as Gaussians over the valence and arousal dimensions. 2. It then describes a technique to personalize the AEG model for individual users via Maximum A Posteriori (MAP) adaptation, using a user's own music annotations to update the model parameters. 3. An evaluation shows the personalized AEG model achieves improved music emotion recognition performance compared to the general AEG model, demonstrating the effectiveness of the proposed adaptation method.

Lee Harvey OswaldLouise Hannecart

��

Evidencias deivy galvis vasquez.Deyvi Galvis Vasquez

��

Choreography ;hazelbaz

��

The document provides choreography directions for a music video involving superheroes/villains. The directions include having actors walk on walls, click fingers, hide in forests and jump out from behind trees to signify powers and sneaking. Additional directions include walking at different angles, jumping off walls, running up to people, and shaking shoulders to depict conflict between heroes and villains. Finally, the directions state having heroes and villains run together, climb to search for themselves, make plans while leaning on trees, and run or drop masks to connect movements to lyrics in the music video.

ACM Multimedia 2012 Grand Challenge: Music Video Generation

1. 1 The Audiovisual Emotion Gaussians Model for Automatic Generation of Music Video Ju-Chiang Wang, Yi-Hsuan Yang, I-Hong Jhuo, Yen-Yu Lin, Hsin-Min Wang Academia Sinica, Taiwan

2. 2 Introduction • Generate the music video based on the emotion content recognized by machine • The novel Audiovisual Emotion Gaussians (AVEG) framework, learns the tripartie relationship among music, video, and emotion • Project music pieces and video sequences into the multi-dimensional emotion space (3DES), and perform the cross-modal matching via the predicted emotion distributions

3. 3 System Diagram • Utilize the DEAP dataset (valence, activation, and potency) – 3D Emotion annotated music videos • Extend the AEG model to handle video (VEG) – Wang et al. (2012), “The Acoustic Emotion Gaussians model for emotion-based music annotation and retrieval,” Proc. ACM MM (full paper)

4. 4 Preliminary Result • Perform the cross-modal retrieval experiment on the 120 music and video clips of DEAP • Evaluate the NDCG@P for the ranking • Measure the average Top 1 Relevance Score Scenario P=5 P=10 P=15 P=20 Audio to Video Ranking 0.8748 0.8316 0.8221 0.8172 Video to Audio Ranking 0.8737 0.8204 0.8105 0.8073 Random Permutation 0.8035 0.7604 0.7441 0.7370 Scenario A to V V to A Random Average Relevance 0.4881 0.4826 0.3837

�ݺ�ߣ

ACM Multimedia 2012 Grand Challenge: Music Video Generation

Recommended

More Related Content

Viewers also liked (7)

ACM Multimedia 2012 Grand Challenge: Music Video Generation