Efficient Multimodal Streaming Recommendation via Expandable Side Mixture-of-Experts
File version
Version of Record (VoR)
Author(s)
Qu, Liang
Chen, Tong
Nguyen, Quoc Viet Hung
Yin, Hongzhi
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Seoul, Republic of Korea
Abstract
Streaming recommender systems (SRSs) are widely deployed in real-world applications, where user interests shift and new items arrive over time. As a result, effectively capturing users' latest preferences is challenging, as interactions reflecting recent interests are limited and new items often lack sufficient feedback. A common solution is to enrich item representations using multimodal encoders (e.g., BERT or ViT) to extract visual and textual features. However, these encoders are pretrained on general-purpose tasks: they are not tailored to user preference modeling, and they overlook the fact that user tastes toward modality-specific features such as visual styles and textual tones can also drift over time. This presents two key challenges in streaming scenarios: the high cost of fine-tuning large multimodal encoders, and the risk of forgetting long-term user preferences due to continuous model updates. To tackle these challenges, we propose Expandable Side Mixture-of-Experts (XSMoE), a memory-efficient framework for multimodal streaming recommendation. XSMoE attaches lightweight side-tuning modules consisting of expandable expert networks to frozen pretrained encoders and incrementally expands them in response to evolving user feedback. A gating router dynamically combines expert and backbone outputs, while a utilization-based pruning strategy maintains model compactness. By learning new patterns through expandable experts without overwriting previously acquired knowledge, XSMoE effectively captures both cold start and shifting preferences in multimodal features. Experiments on three real-world datasets demonstrate that XSMoE outperforms state-of-the-art baselines in both recommendation quality and computational efficiency.
Journal Title
Conference Title
CIKM '25: Proceedings of the 34th ACM International Conference on Information and Knowledge Management
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
ARC
Grant identifier(s)
DP240101108
Rights Statement
Rights Statement
© 2025. This work is licensed under a Creative Commons Attribution International 4.0 License.
Item Access Status
Note
Access the data
Related item(s)
Subject
Persistent link to this record
Citation
Qu, Y; Qu, L; Chen, T; Nguyen, QVH; Yin, H, Efficient Multimodal Streaming Recommendation via Expandable Side Mixture-of-Experts, CIKM '25: Proceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025, pp. 2460-2470