INDEX
    Explanations

    words related to dishonesty, specifically focusing on the concept of lying

    words and phrases related to lying and dishonesty

    New Auto-Interp
    Negative Logits
    ugal
    -0.80
    joining
    -0.71
    orsi
    -0.66
    allows
    -0.64
    iles
    -0.63
    hens
    -0.63
     Alto
    -0.62
    aldi
    -0.61
    runs
    -0.61
    okemon
    -0.61
    POSITIVE LOGITS
     detector
    1.12
    uten
    1.11
    bling
    0.89
    utenant
    0.88
    ulent
    0.84
     detectors
    0.82
    pard
    0.78
    ge
    0.74
    telling
    0.73
     deceive
    0.73
    Act Density 0.031%

    No Known Activations