INDEX
    Explanations

    violent/threatening situations

    New Auto-Interp
    Negative Logits
    -0.08
    ್ವರ
    -0.08
     сот
    -0.08
    ನ್
    -0.08
     ältere
    -0.08
    ್ಯಾಂ
    -0.08
    ರ್
    -0.08
     αρ
    -0.07
     oudere
    -0.07
    .tsv
    -0.07
    POSITIVE LOGITS
    _HINT
    0.09
     conducive
    0.09
    0.08
     dedo
    0.08
    ocal
    0.08
     sah
    0.08
    opensource
    0.08
    otti
    0.08
     ताकि
    0.08
    _position
    0.07
    Act Density 0.037%

    No Known Activations