INDEX
Explanations
references to enemy entities or actions
references to enemies in various contexts
New Auto-Interp
Negative Logits
eret
-0.83
hots
-0.81
20439
-0.80
auntlets
-0.76
Printed
-0.74
ara
-0.74
orrow
-0.73
ifting
-0.73
aunder
-0.73
ranging
-0.72
POSITIVE LOGITS
enemy
1.04
Enemy
0.94
foe
0.91
combatants
0.89
adversary
0.88
takeover
0.85
enemy
0.83
spouse
0.77
darling
0.75
soldier
0.74
Activations Density 0.012%