INDEX
Explanations
mentions of powerlessness and existential themes in storytelling
New Auto-Interp
Negative Logits
arda
-0.15
egra
-0.15
zew
-0.15
Dirt
-0.14
ibs
-0.14
ysa
-0.14
ille
-0.14
aware
-0.14
obar
-0.13
oup
-0.13
POSITIVE LOGITS
both
0.52
Both
0.50
both
0.48
Both
0.46
BOTH
0.45
beide
0.43
ambos
0.40
_both
0.39
_BOTH
0.35
обо
0.34
Activations Density 0.375%