INDEX
    Explanations

    words related to telling lies or being deceitful

    instances of the word "lying" to identify discussions of dishonesty or deception

    New Auto-Interp
    Negative Logits
    Ultra
    -0.77
    obs
    -0.76
    ains
    -0.76
    aldi
    -0.76
    ugal
    -0.75
    iles
    -0.73
    ISO
    -0.72
    ilation
    -0.71
    arthy
    -0.69
    FN
    -0.69
    POSITIVE LOGITS
    utenant
    0.79
     vulner
    0.79
     dormant
    0.79
    sembly
    0.74
    lie
    0.74
     detector
    0.73
    uten
    0.73
     awake
    0.71
     lying
    0.70
     skelet
    0.69
    Act Density 0.012%

    No Known Activations