
1·
8 months ago15 seconds per reply with just 1 token/s?! How short are they? What’s the context size to be processed? I get like 5 tokens per second on my GPU and need 1-2 minutes per reply on 4k context size.
15 seconds per reply with just 1 token/s?! How short are they? What’s the context size to be processed? I get like 5 tokens per second on my GPU and need 1-2 minutes per reply on 4k context size.
I mean the actual context size to be processed for the message, based on chat history, character cards, world info, etc. And which model?