Data Fusion

Ongoing resaerch projects in SAMSI Program on Model Uncertainty.
To be announced in May, 2019.

Summary:
In fall 2018, I presented my previous work (collaborating with Ernest Fokoue) about music mining in the group meeting, and also submitted a paper [1] to introduce the idea of representing any given piece of music as a collection of “musical words” that we codenamed “muselets”, which are essentially musical words of various lengths. We specifically herein construct a naive dictionary featuring a corpus made up of African American, Chinese, Japanese and Arabic music, on which we perform both topic modelling and pattern recognition. Although some of the results based on the Naive Dictionary are reasonably good, we anticipate phenomenal predictive performances once we get around to actually building a full scale complete version of our intended dictionary of muselets. The idea of Data Fusion in this work is that we create uniform representation of music based on different sources and forms of musical data.

In spring 2019, I will collaborate with Dr. Jong-Min Kim who extensively studies Mixture of D-vine copulas [2]. We plan to combine text mining and copula methods in the application of movie markets. Specifically, we study Chinese and American movie markets, to see the effects of different types of movies on the returns. For the most profitable movies, we study multiple effects that make those movies win the sales (potential factors: movie types, movie director, actors, actress etc.).

[1] Wu, Qiuyi; Fokoue, Ernest. (2018) Naive Dictionary On Musical Corpora: From Knowledge Representation To Pattern Recognition. arXiv:1811.12802
[2] Kim, D., Kim, J. M., Liao, S. M., & Jung, Y. S. (2013). Mixture of D-vine copulas for modeling dependence. Computational Statistics & Data Analysis, 64, 1-19.