INDEX
Explanations
mentions or references to military or combat-related terms and concepts
references to military organizations or combat forces
New Auto-Interp
Negative Logits
Increased
-0.73
venge
-0.69
Previously
-0.67
urther
-0.66
Reverse
-0.66
oward
-0.66
benefiting
-0.62
reversing
-0.62
renewed
-0.59
Improved
-0.59
POSITIVE LOGITS
barely
1.13
scarcely
1.12
hardly
1.07
only
1.03
nowhere
1.02
seldom
1.01
rarely
1.01
merely
0.98
mere
0.96
only
0.95
Activations Density 0.937%