INDEX
    Explanations

    words related to distress or suffering

    New Auto-Interp
    Negative Logits
    ãĥĨãĥ«
    -0.16
     Hlav
    -0.16
    ãģŁãģĹ
    -0.16
    ampire
    -0.15
    dux
    -0.15
    elter
    -0.14
    _locals
    -0.14
    allel
    -0.14
    631
    -0.14
    895
    -0.14
    POSITIVE LOGITS
     Shen
    0.16
    iban
    0.15
    bian
    0.14
     Barcl
    0.14
     LM
    0.14
    Isl
    0.14
    heimer
    0.14
    bes
    0.13
    _lm
    0.13
    --------------------
    0.13
    Act Density 0.039%

    No Known Activations