INDEX
    Explanations

    terms related to innocence and victims

    New Auto-Interp
    Negative Logits
    lfw
    -0.17
    phan
    -0.17
    ilet
    -0.16
    ongan
    -0.15
    illisecond
    -0.15
    688
    -0.15
    ionic
    -0.15
    yte
    -0.14
    alog
    -0.14
    imli
    -0.14
    POSITIVE LOGITS
     innocent
    0.27
     bystand
    0.25
     innocence
    0.24
     innoc
    0.24
     Innoc
    0.23
     harmless
    0.19
     civilians
    0.18
    /simple
    0.17
     victims
    0.16
    -looking
    0.16
    Act Density 0.011%

    No Known Activations