INDEX
Explanations
words related to adversaries or opponents
mentions of "enemies" in various contexts
New Auto-Interp
Negative Logits
auntlets
-0.76
eret
-0.75
ced
-0.75
Ħ¢
-0.73
otide
-0.71
oled
-0.65
lic
-0.65
cing
-0.65
med
-0.64
docs
-0.64
POSITIVE LOGITS
foe
1.04
enemies
1.02
enemy
0.98
Enemy
0.94
Enemies
0.92
adversaries
0.91
foes
0.89
hip
0.88
vanquished
0.86
enemy
0.83
Activations Density 0.013%