Video2Script
# Video2Script
# 基本概念
# 文献调研
VideoChat
[2305.06355] VideoChat: Chat-Centric Video Understanding (opens new window)
【双流:逐帧+视频】
MVBench(VideoChat2)
[2311.17005] MVBench: A Comprehensive Multi-modal Video Understanding Benchmark (opens new window)
Dolphin
【开源项目】
# 相关工作
# 其他工作
GRiT
Dense Video Object Captioning from Disjoint Supervision
[2306.11729] Dense Video Object Captioning from Disjoint Supervision (arxiv.org) (opens new window)
A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video
Vid2Seq(CVPR-23)
VidChapters-7M(NIPS-23)
[2309.13952] VidChapters-7M: Video Chapters at Scale (arxiv.org) (opens new window)
编辑 (opens new window)