INDEX
Explanations
themes related to social justice and inequality
New Auto-Interp
Negative Logits
ISupport
-0.55
nenhuma
-0.55
asse
-0.54
مشين
-0.54
riwal
-0.53
esist
-0.52
Every
-0.52
ilhas
-0.52
nessun
-0.52
噺
-0.51
POSITIVE LOGITS
quienes
0.71
którzy
0.70
kteří
0.67
who
0.63
individuals
0.61
ones
0.61
themselves
0.61
*/),
0.58
ReactDOM
0.57
whom
0.56
Activations Density 0.342%