INDEX
Explanations
expressions of honesty and straightforwardness
New Auto-Interp
Negative Logits
=’
-0.63
StringProperty
-0.57
kac
-0.55
ledes
-0.54
новым
-0.54
disponibilités
-0.53
raggiungere
-0.52
新的
-0.51
brać
-0.51
AndWait
-0.51
POSITIVE LOGITS
frankly
1.11
Honestly
1.10
honestly
1.08
Frankly
1.06
Honestly
1.04
honestly
1.02
tbh
0.95
оригіналу
0.80
说实话
0.77
Tbh
0.73
Activations Density 0.086%