INDEX
    Explanations

    phrases related to deception or false statements

    instances of the word "lying," which indicates dishonesty or deceit

    New Auto-Interp
    Negative Logits
    Ultra
    -0.86
    obs
    -0.79
    ISO
    -0.78
    ilation
    -0.74
    joining
    -0.70
    FN
    -0.70
    iles
    -0.69
    aldi
    -0.69
    Specific
    -0.67
    ORE
    -0.65
    POSITIVE LOGITS
     horizont
    0.78
     lying
    0.78
     lie
    0.72
     liar
    0.71
    utenant
    0.71
     pills
    0.70
     siege
    0.70
     skelet
    0.70
    acies
    0.70
     lied
    0.70
    Act Density 0.008%

    No Known Activations