here's where FastVLM comes in



they slap an MLP to project visual tokens from FastViTHD into the LLM's world

the result: way fewer tokens (like 4× less than FastViT, 16× less than ViT‑L/14 at 336‑pixel res). I mean, that's a big dropping in token count and complexity, while
IN-15.59%
MLP-1.55%
PIXEL-4.88%
TOKEN-2.92%
post-image
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 7
  • Repost
  • Share
Comment
0/400
GlueGuyvip
· 09-03 20:47
This operation is quite impressive.
View OriginalReply0
rekt_but_resilientvip
· 09-03 16:16
The improvement here is huge!
View OriginalReply0
GasFeeLovervip
· 09-02 14:39
That's just how it is, what is there to brag about?
View OriginalReply0
ser_we_are_earlyvip
· 09-02 14:39
It seems FastVLM is really amazing.
View OriginalReply0
BlockchainBardvip
· 09-02 14:38
Impressive! I was shocked by the number of tokens.
View OriginalReply0
WhaleWatchervip
· 09-02 14:27
New things have been made again!
View OriginalReply0
DiamondHandsvip
· 09-02 14:23
Ah, I'm a bit dazed by the blowing...
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)