INDEX
Explanations
references to movies, particularly those with comedic or fantastical elements
New Auto-Interp
Negative Logits
olit
-0.18
_simps
-0.15
STAT
-0.14
enson
-0.14
erre
-0.14
è¾ħ
-0.14
cela
-0.14
ież
-0.13
êµ´
-0.13
unday
-0.13
POSITIVE LOGITS
nak
0.14
Feature
0.14
Priv
0.14
Guerr
0.14
0.14
lex
0.13
ocities
0.13
feature
0.13
Amazon
0.13
Rank
0.13
Activations Density 0.028%