INDEX
Explanations
references to imaginative or playful character concepts and storytelling
New Auto-Interp
Negative Logits
ifa
-0.18
ihan
-0.16
á»IJ
-0.15
ække
-0.15
ISTA
-0.14
ÙĨس
-0.14
zew
-0.14
zers
-0.13
orgia
-0.13
benh
-0.13
POSITIVE LOGITS
Rt
0.15
Fang
0.14
ridden
0.14
ÙĪÛĮ
0.13
underst
0.13
اÙĦعÙħ
0.13
ingleton
0.13
ym
0.13
yes
0.13
613
0.12
Activations Density 0.833%