INDEX
    Explanations

    expressions of honesty and straightforwardness

    New Auto-Interp
    Negative Logits
    =’
    -0.64
     étend
    -0.63
    InvalidProtocol
    -0.60
    AndWait
    -0.59
    ană
    -0.57
     μέ
    -0.54
     spéciaux
    -0.54
    GEBURTS
    -0.54
    льше
    -0.53
    artin
    -0.52
    POSITIVE LOGITS
    Frankly
    1.04
    Honestly
    1.02
     honestly
    1.00
     frankly
    0.98
     Honestly
    0.92
    Tbh
    0.89
     tbh
    0.89
    honestly
    0.83
    说实话
    0.83
     admit
    0.80
    Act Density 0.129%

    No Known Activations