INDEX
Explanations
references to wars or conflicts
references to various "wars" across different contexts and themes
New Auto-Interp
Negative Logits
AUT
-0.76
gow
-0.75
Asset
-0.75
YL
-0.68
alties
-0.65
Safety
-0.64
OGR
-0.64
Known
-0.64
ritch
-0.64
SOURCE
-0.63
POSITIVE LOGITS
hips
1.19
hip
1.13
lords
0.92
wars
0.90
pite
0.87
poons
0.86
mith
0.85
kies
0.84
waged
0.83
raged
0.82
Activations Density 0.018%