I am a teacher and I have a LOT of different literature material that I wish to study, and play around with.

I wish to have a self-hosted and reasonably smart LLM into which I can feed all the textual material I have generated over the years. I would be interested to see if this model can answer some of my subjective course questions that I have set over my exams, or write small paragraphs about the topic I teach.

In terms of hardware, I have an old Lenovo laptop with an NVIDIA graphics card.

P.S: I am not technically very experienced. I run Linux and can do very basic stuff. Never self hosted anything other than LibreTranslate and a pihole!

  • Fisch@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    6
    ·
    7 months ago

    What I’m using is Text Generation WebUI with an 11B GGUF model from Huggingface. I offloaded all layers to the GPU, which uses about 9GB of VRAM. With GGUF models, you can choose how many layers to offload to the GPU, so it uses less VRAM. Layers that aren’t offloaded use system RAM and the CPU, which will be slower.