Multimodal tasks are gradually attracting the attention of the research community, and the lack of multimodal event extraction datasets restricts the development of multimodal event extraction. We introduce the new Multimodal Event Extraction Dataset (MEED) to fill the gap, we define event types and argument roles that can be used on multimodal data, then use controllable text generation to generate the textual modality based on visual event extraction dataset. In this paper, we aim to make full use of multimodal resources in the event extraction task by constructing a large-scale and high-quality multimodal event extraction dataset and promote researches in the field of multimodal event extraction.