INDEX
Explanations
terms related to hostile actions or entities
references to an "enemy."
New Auto-Interp
Negative Logits
hots
-0.83
mberg
-0.80
auntlets
-0.79
razil
-0.78
20439
-0.73
eret
-0.72
ucket
-0.71
oled
-0.71
akers
-0.70
regon
-0.70
POSITIVE LOGITS
enemy
1.26
foe
1.14
Enemy
1.13
adversary
1.05
combatants
0.97
enemy
0.95
enemies
0.93
adversaries
0.91
opponent
0.88
foes
0.87
Activations Density 0.011%