INDEX
    Explanations

    expressions of emotional avoidance or denial of responsibility

    New Auto-Interp
    Negative Logits
    opia
    -0.19
     fooled
    -0.16
    ivor
    -0.15
    onsense
    -0.15
    934
    -0.15
    hin
    -0.15
    UNCTION
    -0.14
     ún
    -0.14
    acam
    -0.14
     tolerate
    -0.14
    POSITIVE LOGITS
     alien
    0.45
     Alien
    0.33
    alien
    0.31
    Ali
    0.28
     antagon
    0.26
     aliens
    0.26
     Ali
    0.26
     risk
    0.25
     anger
    0.24
    risk
    0.22
    Act Density 0.217%

    No Known Activations