INDEX
Explanations
phrases that express confusion or disapproval about situations
New Auto-Interp
Negative Logits
erset
-0.17
alie
-0.17
ucker
-0.16
CGRectGet
-0.16
positor
-0.15
chwitz
-0.15
linger
-0.15
hete
-0.14
imeters
-0.14
rosse
-0.14
POSITIVE LOGITS
tir
0.16
Zig
0.16
Vectorizer
0.15
Bender
0.15
iq
0.15
sting
0.15
é̏
0.15
ामà¤Ĺ
0.14
stick
0.14
kah
0.14
Activations Density 0.170%