INDEX
Explanations
references to societal issues and human rights violations
New Auto-Interp
Negative Logits
qi
-0.16
اÙĦتس
-0.15
dez
-0.14
orb
-0.14
agraph
-0.14
quat
-0.14
Cres
-0.14
stra
-0.14
ово
-0.14
ntl
-0.14
POSITIVE LOGITS
poor
0.19
towards
0.17
omanip
0.17
others
0.16
äch
0.16
Others
0.15
Gle
0.15
/animate
0.15
collateral
0.15
women
0.14
Activations Density 0.217%