INDEX
    Explanations

    references to anger and related emotions

    New Auto-Interp
    Negative Logits
    ksen
    -0.16
    itive
    -0.16
    atives
    -0.15
    etto
    -0.15
    podob
    -0.15
    ardon
    -0.14
    tracer
    -0.14
     Garn
    -0.14
    etten
    -0.14
    raya
    -0.14
    POSITIVE LOGITS
     Ang
    0.27
    els
    0.22
    Ang
    0.22
    gota
    0.21
    лий
    0.20
    ELS
    0.20
    los
    0.20
    eline
    0.19
    (ang
    0.19
     ang
    0.19
    Act Density 0.014%

    No Known Activations