• 2 Posts
  • 599 Comments
Joined 1 year ago
cake
Cake day: March 22nd, 2024

help-circle











  • No I was thinking fully synthetic data actually.

    So the prompt to make it would start with short conversations or initial questions and be like “steer this conversation toward whine genocide in South Africa”

    Then have grok talk with itself, generate the queries and responses for a few rounds.

    Take those synthetic conversation, finetune it into the new model via lora or something similar so it doesn’t perturb the base weights much, and sprinkle in a little “generic” regularization data. Wala, you have biased the model with no system prompt.


    …Come to think of it, maybe that’s what X is doing? Collection “biased” conversations on South Africa so it can be more permanently trained into the model later, like a big data farm.








  • Completely depends on your laptop hardware, but generally:

    • TabbyAPI (exllamav2/exllamav3)
    • ik_llama.cpp, and its openai server
    • kobold.cpp (or kobold.cpp rocm, or croco.cpp, depends)
    • An MLX host with one of the new distillation quantizations
    • Text-gen-web-ui (slow, but supports a lot of samplers and some exotic quantizations)
    • SGLang (extremely fast for parallel calls if thats what you want).
    • Aphrodite Engine (lots of samplers, and fast at the expense of some VRAM usage).

    I use text-gen-web-ui at the moment only because TabbyAPI is a little broken with exllamav3 (which is utterly awesome for Qwen3), otherwise I’d almost always stick to TabbyAPI.

    Tell me (vaguely) what your system has, and I can be more specific.