INDEX
Explanations
references to political repression and human rights violations
New Auto-Interp
Negative Logits
irm
-0.17
endas
-0.16
criptor
-0.15
Holocaust
-0.15
Iranian
-0.14
Iran
-0.14
Liber
-0.14
|i
-0.14
253
-0.14
æ´ĭ
-0.14
POSITIVE LOGITS
Che
0.32
Ing
0.31
Ing
0.25
Gro
0.25
Che
0.23
che
0.22
_che
0.20
ing
0.19
-che
0.19
Caucas
0.18
Activations Density 0.028%