CV1 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT) 기본 정보- 학회: ICLR 2021- 저자: A. Dosovitskiy et al.,- 기관: Google Reseach, Brain Team Abstract영어 원문While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional network.. 2025. 2. 12. 이전 1 다음