INDEX
    Explanations

    expressions of honesty and straightforwardness

    New Auto-Interp
    Negative Logits
    =’
    -0.63
    StringProperty
    -0.57
    kac
    -0.55
    ledes
    -0.54
     новым
    -0.54
     disponibilités
    -0.53
     raggiungere
    -0.52
    新的
    -0.51
    brać
    -0.51
    AndWait
    -0.51
    POSITIVE LOGITS
     frankly
    1.11
    Honestly
    1.10
     honestly
    1.08
    Frankly
    1.06
     Honestly
    1.04
    honestly
    1.02
     tbh
    0.95
     оригіналу
    0.80
    说实话
    0.77
    Tbh
    0.73
    Act Density 0.086%

    No Known Activations