INDEX
Explanations
references to hand-to-hand combat
terms related to combat or fighting scenarios
New Auto-Interp
Negative Logits
oint
-0.87
icket
-0.85
andan
-0.84
anke
-0.80
oos
-0.80
oof
-0.80
ais
-0.80
lessly
-0.80
anda
-0.79
andise
-0.77
POSITIVE LOGITS
Mae
0.64
prevail
0.61
steady
0.60
Combat
0.60
torches
0.59
DEN
0.59
jah
0.58
â̦â̦â̦â̦â̦â̦â̦â̦
0.58
bane
0.57
legion
0.57
Activations Density 0.036%