INDEX
    Explanations

    verbs related to deception or misleading actions

    New Auto-Interp
    Negative Logits
    rises
    -0.79
    foreseen
    -0.67
    capacity
    -0.66
    ynski
    -0.65
    ateur
    -0.65
    airo
    -0.64
    hens
    -0.64
    area
    -0.64
    joining
    -0.63
    riot
    -0.62
    POSITIVE LOGITS
     perpetrated
    0.93
     deceive
    0.93
    ulent
    0.91
    ulence
    0.86
     ABOUT
    0.85
    esty
    0.83
    uten
    0.82
     omission
    0.82
     misrepresent
    0.82
     falsely
    0.81
    Act Density 0.111%

    No Known Activations