INDEX
Explanations
mention of conflicts or battles
the term "wars" and its variations in context
New Auto-Interp
Negative Logits
gow
-0.76
YL
-0.66
Dialogue
-0.66
opathy
-0.65
obook
-0.62
OGR
-0.61
uration
-0.61
STER
-0.61
urated
-0.61
Accuracy
-0.60
POSITIVE LOGITS
hip
1.26
hips
1.14
waged
0.95
raged
0.83
wars
0.83
pread
0.83
raging
0.83
pite
0.82
pace
0.82
fought
0.79
Activations Density 0.044%