INDEX
Explanations
terms related to warriors or fighting
references to "warriors," indicating a focus on themes of strength, combat, or heroic figures
New Auto-Interp
Negative Logits
uate
-0.77
ĸļ
-0.76
orage
-0.75
ories
-0.72
alez
-0.70
elong
-0.70
election
-0.67
upon
-0.65
perm
-0.65
sembly
-0.65
POSITIVE LOGITS
riors
1.29
rior
1.20
warriors
0.98
warrior
0.94
llan
0.90
¯¯¯¯
0.84
fare
0.80
fish
0.76
jit
0.75
¯¯¯¯¯¯¯¯
0.74
Activations Density 0.013%