INDEX
    Explanations

    statements expressing honesty or frankness

    New Auto-Interp
    Negative Logits
    etting
    -0.69
     Blades
    -0.68
    arthy
    -0.66
    ammy
    -0.65
    tailed
    -0.65
    ied
    -0.64
     Landing
    -0.64
    tein
    -0.60
     Klu
    -0.59
    lav
    -0.59
    POSITIVE LOGITS
     speaking
    1.04
    é¾įåĸļ士
    0.85
    zers
    0.84
    speaking
    0.75
     honestly
    0.73
    ometry
    0.67
     admit
    0.67
     tho
    0.66
     doubted
    0.65
     ashamed
    0.64
    Act Density 0.032%

    No Known Activations