INDEX
Explanations
mentions of hostility or hostile behavior in the text
instances of the word "hostile" or related phrases
New Auto-Interp
Negative Logits
oled
-0.87
orah
-0.86
ucket
-0.83
20439
-0.81
akings
-0.80
arist
-0.79
alt
-0.75
attr
-0.74
issue
-0.74
acea
-0.74
POSITIVE LOGITS
hostile
1.32
takeover
1.01
undermin
0.94
hostility
0.94
citiz
0.83
invaders
0.83
invasion
0.81
retaliation
0.81
interference
0.78
friendly
0.77
Activations Density 0.011%