INDEX
Explanations
mentions and descriptions of different types of warriors throughout various contexts, including historical, mythological, and fictional
New Auto-Interp
Negative Logits
uate
-0.85
orage
-0.79
ories
-0.79
changes
-0.75
ions
-0.74
ibly
-0.71
upon
-0.70
ascript
-0.69
urations
-0.68
ĸļ
-0.67
POSITIVE LOGITS
riors
1.35
rior
1.33
¯¯¯¯
0.94
fare
0.89
¯¯¯¯¯¯¯¯
0.85
warriors
0.84
¯¯
0.83
hip
0.82
warrior
0.81
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
0.77
Activations Density 0.062%