INDEX
    Explanations

    Dialogue and gratitude

    New Auto-Interp
    Negative Logits
    _REGEX
    -0.08
    _SIG
    -0.08
    Unlike
    -0.08
    _Rect
    -0.08
     Unlike
    -0.08
    -0.08
    _START
    -0.08
    'arrêt
    -0.08
     violently
    -0.07
    _AS
    -0.07
    POSITIVE LOGITS
     thank
    0.13
     thanking
    0.13
    THANK
    0.13
     THANK
    0.12
     teşekkür
    0.12
     gratitude
    0.11
     നന്ദ
    0.11
    Спасибо
    0.11
     спасибо
    0.11
     תודה
    0.11
    Act Density 0.029%

    No Known Activations