INDEX
Explanations
references to "legends" and "legendary" elements across various contexts
New Auto-Interp
Negative Logits
er
-0.22
eron
-0.18
era
-0.16
age
-0.16
Majority
-0.15
erap
-0.15
tps
-0.14
dued
-0.14
inear
-0.14
erman
-0.14
POSITIVE LOGITS
ry
0.20
996
0.16
proportions
0.15
loth
0.15
ácil
0.15
Ïģθ
0.15
imized
0.15
rier
0.15
lore
0.14
aryl
0.14
Activations Density 0.014%