INDEX
Explanations
references to settings and character dynamics in fictional narratives
New Auto-Interp
Negative Logits
...
-0.25
&
-0.22
â̦
-0.20
...
-0.20
--
-0.19
"
-0.18
-
-0.18
&↵
-0.17
"...
-0.17
--
-0.17
POSITIVE LOGITS
Stone
0.17
Sam
0.15
auen
0.14
{\↵0.14
various
0.14
Stone
0.13
Jewish
0.13
ÑĢовиÑĩ
0.13
:,
0.13
urger
0.13
Activations Density 0.001%