INDEX
    Explanations

    situations where a justification or excuse is given for certain actions

    terms related to deceptive justifications or rationalizations

    New Auto-Interp
    Negative Logits
    omer
    -0.76
    omers
    -0.68
    irc
    -0.67
    omb
    -0.63
    devices
    -0.62
    hani
    -0.60
    itter
    -0.59
    apsed
    -0.59
    eder
    -0.59
    ECT
    -0.59
    POSITIVE LOGITS
     pretext
    1.24
    ãĥ¼ãĥĨãĤ£
    0.87
    milo
    0.86
    ual
    0.85
     excuse
    0.83
     guise
    0.81
     accuser
    0.81
    atis
    0.79
    ress
    0.78
     Tanz
    0.78
    Act Density 0.016%

    No Known Activations