INDEX
Explanations
words related to frustration or confusion
New Auto-Interp
Negative Logits
ueue
-0.20
ffe
-0.19
UED
-0.19
EMPLARY
-0.19
ODEV
-0.17
uela
-0.17
Äįka
-0.17
_Tis
-0.16
TRGL
-0.16
meni
-0.16
POSITIVE LOGITS
um
0.39
ub
0.33
up
0.32
ul
0.32
ur
0.31
uk
0.31
ut
0.31
ud
0.28
un
0.27
av
0.27
Activations Density 0.026%