INDEX
Explanations
mentions of historical events related to war, specifically World War II
New Auto-Interp
Negative Logits
sher
-0.71
OGR
-0.68
APS
-0.67
say
-0.65
doi
-0.65
QUI
-0.63
DEF
-0.62
prosecut
-0.62
predictive
-0.61
ITY
-0.60
POSITIVE LOGITS
II
1.36
Two
1.12
III
1.08
XII
0.95
Three
0.92
ii
0.90
1942
0.88
XVI
0.88
VII
0.86
planes
0.85
Activations Density 0.021%