INDEX
Explanations
phrases related to conflicts or confrontations
concepts related to intervention, conflict, and societal issues
New Auto-Interp
Negative Logits
ĪĴ
-0.67
¡
-0.64
ĨĴ
-0.60
Ħ¢
-0.60
princ
-0.56
Tes
-0.55
teens
-0.53
ª
-0.53
Kemp
-0.53
°
-0.52
POSITIVE LOGITS
.
0.98
.:
0.93
.''.
0.85
.;
0.83
.(
0.82
.#
0.76
.,
0.74
.-
0.72
.–
0.72
.?
0.71
Activations Density 0.255%