INDEX
Explanations
words expressing disappointment or sadness
New Auto-Interp
Negative Logits
ÑĢÑĮ
-0.15
hoff
-0.14
اÙĨÙĩ
-0.14
addtogroup
-0.13
umper
-0.13
hp
-0.13
warz
-0.13
Finder
-0.13
-0.13
assert
-0.13
POSITIVE LOGITS
heart
0.30
pir
0.29
concert
0.28
orient
0.27
appoint
0.27
astr
0.26
ench
0.26
illusion
0.25
appointed
0.24
oriented
0.24
Activations Density 0.015%