INDEX
    Explanations

    phrases indicating an acknowledgment of errors or mistakes

    phrases related to guilt and truthfulness

    New Auto-Interp
    Negative Logits
     accompany
    -0.75
    lde
    -0.72
     accompanying
    -0.72
     Flavoring
    -0.72
    elve
    -0.70
     greets
    -0.70
    Cooldown
    -0.69
    ingle
    -0.69
    »Ĵ
    -0.67
    phabet
    -0.66
    POSITIVE LOGITS
     wrong
    1.90
     Wrong
    1.68
    wrong
    1.64
     incorrect
    1.55
     mistake
    1.42
     faulty
    1.39
     wrongly
    1.37
     mistaken
    1.33
    Wr
    1.32
     misleading
    1.32
    Act Density 0.667%

    No Known Activations