Really slow inference

#4
by Lobliqua - opened

Hello, thanks for the amazing work!

I'm having slowdowns during the loading and inference of this model with transformers (4.52.4) and pytorch (2.7.0) using an RTX 5090 (Less than 1 token/s), have you guys experienced this?

yeah, same...

Sign up or log in to comment