INDEX
Explanations
mentions of characters and their complexities in narratives
New Auto-Interp
Negative Logits
ÄĻż
-0.15
uncture
-0.15
ochen
-0.14
habi
-0.14
GLOSS
-0.14
ACHINE
-0.14
urt
-0.14
Zombies
-0.14
oment
-0.14
cosa
-0.14
POSITIVE LOGITS
.Animation
0.16
Jaune
0.16
Animation
0.16
literally
0.15
rosso
0.15
dere
0.15
ãĤ¯
0.14
Barrier
0.14
trope
0.14
.literal
0.14
Activations Density 0.004%