INDEX
Explanations
references to political and social issues, particularly related to human rights violations and agreements
New Auto-Interp
Negative Logits
.<
-0.78
."[
-0.76
.[
-0.73
!.
-0.71
".[
-0.71
.""
-0.70
.</
-0.70
.","
-0.64
+.
-0.62
.).
-0.61
POSITIVE LOGITS
ãĤ¼ãĤ¦ãĤ¹
0.56
schild
0.55
wealth
0.50
)]
0.50
doms
0.50
iru
0.49
estern
0.48
disparate
0.47
ãĥĻ
0.47
ottest
0.47
Activations Density 1.736%