INDEX
    Explanations

    terms related to deception or falsehoods

    New Auto-Interp
    Negative Logits
    weeney
    -0.81
    undai
    -0.80
    accompan
    -0.80
    odes
    -0.79
    icts
    -0.77
    laws
    -0.77
    ses
    -0.76
    izont
    -0.74
    ippers
    -0.73
    ographics
    -0.73
    POSITIVE LOGITS
     concoct
    1.13
     invented
    1.02
     perpetrated
    0.91
     unworthy
    0.90
     mir
    0.86
     excuse
    0.83
     gimmick
    0.82
     because
    0.81
     devoid
    0.81
     conceived
    0.80
    Act Density 0.072%

    No Known Activations