INDEX
Explanations
references to geopolitics, military conflicts, and political figures
New Auto-Interp
Negative Logits
thood
-0.79
besides
-0.76
without
-0.75
because
-0.73
eno
-0.71
verage
-0.70
leeve
-0.69
differs
-0.68
rade
-0.68
anyways
-0.65
POSITIVE LOGITS
latter
1.34
oldest
1.30
largest
1.29
longest
1.25
smallest
1.20
biggest
1.19
fastest
1.15
earliest
1.15
latest
1.14
youngest
1.12
Activations Density 1.111%