INDEX
    Explanations

    phrases related to unjust accusations or convictions

    terms related to wrongful actions or accusations

    New Auto-Interp
    Negative Logits
     Observer
    -0.76
    itarian
    -0.74
    =-=-=-=-=-=-=-=-
    -0.70
    illary
    -0.70
     Authorization
    -0.69
    illin
    -0.69
     Hands
    -0.69
    iry
    -0.68
    olitan
    -0.68
    arya
    -0.68
    POSITIVE LOGITS
     falsely
    1.15
     wrongly
    0.97
     misled
    0.92
     mistakenly
    0.90
     errone
    0.87
     accuse
    0.84
     dissemin
    0.82
     fooled
    0.82
     blinded
    0.80
     scratched
    0.79
    Act Density 0.011%

    No Known Activations