INDEX
Explanations
keywords related to discrimination and rights advocacy
New Auto-Interp
Negative Logits
otos
-0.18
askell
-0.16
AXB
-0.14
rique
-0.14
wald
-0.14
mmc
-0.14
ког
-0.13
تÙĥ
-0.13
abilit
-0.13
arkin
-0.13
POSITIVE LOGITS
against
1.36
against
1.19
Against
1.13
Against
1.05
gegen
0.82
tegen
0.79
对
0.77
contre
0.75
_again
0.74
пÑĢоÑĤи
0.70
Activations Density 0.576%