INDEX
    Explanations

    phrases indicating giving up or surrendering

    New Auto-Interp
    Negative Logits
    Ïĥο
    -0.16
    EU
    -0.15
    appa
    -0.14
     premi
    -0.14
    olf
    -0.14
     Folk
    -0.14
     ÄĮer
    -0.14
    utsch
    -0.13
    adle
    -0.13
    олÑİ
    -0.13
    POSITIVE LOGITS
    -eff
    0.14
    sert
    0.14
    nst
    0.14
     Eff
    0.14
    ÑĤÑĸв
    0.14
    razier
    0.14
    annie
    0.14
    page
    0.14
     Joy
    0.13
    eff
    0.13
    Act Density 0.011%

    No Known Activations