"Mind-Video" is a two-module pipeline designed to bridge the gap between image and video brain decoding. It progressively learns from brain signals to gain a deeper understanding of the semantic space through multiple stages. The model leverages unsupervised learning with masked brain modeling to learn visual fMRI features and distills semantic-related features using contrastive learning in the CLIP space. The learned features are fine-tuned through co-training with a stable diffusion model tailored for video generation under fMRI guidance. The results show high-quality video reconstructions with accurate semantics, outperforming previous approaches in both semantic and pixel metrics.
Mind Video was created by Zijiao Chen, Jiaxin Qing, Tiange Xiang, and Prof. Juan Helen Zhou. It was launched on June 18, 2024. Mind Video is developed by Mind-X, a research interest group focusing on multimodal brain decoding with large models. The group aims to advance brain decoding using recent advancements in large models and AGI, with a goal to develop general-purpose brain decoding models for applications in brain-computer interface, neuroimaging, and neuroscience.
To use Mind-Video, follow these steps:
Understanding the Purpose: Mind-Video aims to bridge the gap between image and video brain decoding by introducing a two-module pipeline designed for this purpose.
Model Training: The Mind-Video model consists of two modules trained separately and then fine-tuned together. The first module focuses on leveraging unsupervised learning techniques to learn general visual fMRI features and distill semantic-related features using annotated datasets. The second module involves fine-tuning learned features through co-training with an augmented stable diffusion model tailored for video generation under fMRI guidance.
Progressive Learning: The model progressively learns from brain signals, gaining a deeper understanding of semantic space through multiple stages in the first module, including multimodal contrastive learning with spatiotemporal attention for windowed fMRI.
Results Evaluation: Mind-Video has shown promising results, with high-quality videos generated, accurate semantics captured, and significant improvements in semantic metrics and SSIM over previous approaches.
Contribution to the Field: Mind-Video introduces a flexible and adaptable brain decoding pipeline, with an emphasis on learning accuracy, semantic relevance, and model interpretability.
By following these steps, users can effectively utilize Mind-Video for image-to-video brain decoding applications.
No reviews found!