INDEX
Explanations
words related to hostility and conflict, especially in a negative context
instances of the word "hostile."
New Auto-Interp
Negative Logits
orah
-0.85
ucket
-0.83
regon
-0.80
frey
-0.79
illon
-0.79
otide
-0.78
ingham
-0.77
annis
-0.77
ulet
-0.75
ainers
-0.74
POSITIVE LOGITS
takeover
1.00
hostile
0.90
retaliation
0.87
retribution
0.76
hostility
0.76
toward
0.76
invasion
0.75
towards
0.74
interference
0.74
repr
0.73
Activations Density 0.024%