INDEX
Explanations
phrases indicating evaluation or assessment ratings
New Auto-Interp
Negative Logits
hait
-0.07
haar
-0.07
ục
-0.07
buz
-0.07
ยม
-0.07
YNC
-0.07
cobra
-0.07
pell
-0.06
Falsy
-0.06
inizin
-0.06
POSITIVE LOGITS
ones
0.21
ones
0.14
Ones
0.13
ours
0.12
ãĤĤãģ®
0.11
theirs
0.10
mine
0.10
yours
0.09
hers
0.09
Mine
0.07
Activations Density 0.069%