INDEX
Explanations
mentions of political figures and events in the context of government policies and international relations
New Auto-Interp
Negative Logits
replace
-0.81
gat
-0.79
namely
-0.76
rand
-0.72
craft
-0.71
ftime
-0.70
thood
-0.68
watching
-0.68
/
-0.67
.--
-0.66
POSITIVE LOGITS
entire
1.51
entirety
1.45
remainder
1.35
slightest
1.29
same
1.24
whole
1.19
latter
1.17
ses
1.16
smallest
1.15
brunt
1.13
Activations Density 1.546%