INDEX
    Explanations

    conversational exchanges and dialogues

    New Auto-Interp
    Negative Logits
     impactful
    -0.75
     tbh
    -0.73
     incentiv
    -0.68
     tasked
    -0.66
     idk
    -0.66
     emojis
    -0.66
     bestie
    -0.66
     relatable
    -0.65
     curated
    -0.65
     Idk
    -0.65
    POSITIVE LOGITS
    faßt
    0.75
     muß
    0.73
     lousy
    0.59
     müßte
    0.55
     Schluß
    0.54
     biß
    0.52
     wuß
    0.50
     quelquefois
    0.49
     mußte
    0.49
     daß
    0.47
    Act Density 0.996%

    No Known Activations