INDEX
Explanations
words related to being surprised or not expecting a particular outcome
New Auto-Interp
Negative Logits
kay
-0.20
ngth
-0.20
apt
-0.19
apeshifter
-0.19
obal
-0.19
hesion
-0.19
amins
-0.19
urai
-0.18
itiz
-0.18
©¶æ
-0.18
POSITIVE LOGITS
LER
0.22
stakes
0.20
swick
0.19
Sax
0.19
Rate
0.19
ATIONS
0.18
Dra
0.18
inged
0.18
LB
0.18
052
0.18
Activations Density 12.268%