INDEX
Explanations
references to storytelling and narrative structures
New Auto-Interp
Negative Logits
/ros
-0.16
inspace
-0.15
iek
-0.15
ister
-0.15
æīķ
-0.15
amax
-0.14
ppard
-0.14
isz
-0.14
iors
-0.14
rollo
-0.14
POSITIVE LOGITS
oog
0.16
vens
0.14
257
0.14
wang
0.14
umat
0.14
Her
0.14
unga
0.13
assic
0.13
urally
0.13
up
0.13
Activations Density 0.006%