INDEX
    Explanations

    verbs that suggest deception

    verbs and phrases that indicate personal interactions or relationships

    New Auto-Interp
    Negative Logits
     Arrow
    -0.81
    irmation
    -0.68
    encing
    -0.65
    Quote
    -0.65
    irming
    -0.62
    ravel
    -0.62
    hatt
    -0.61
    ruction
    -0.61
    hern
    -0.60
    onement
    -0.60
    POSITIVE LOGITS
    ©¶æ¥µ
    0.74
    aciously
    0.68
     passionately
    0.66
    ij士
    0.65
     himself
    0.65
     herself
    0.64
     menstru
    0.64
     tirelessly
    0.62
     worshipped
    0.62
    vae
    0.61
    Act Density 0.333%

    No Known Activations