INDEX
Explanations
references to social dynamics and inequalities
New Auto-Interp
Negative Logits
emo
-0.18
avou
-0.18
urum
-0.16
arella
-0.15
اÙħÙĩ
-0.15
ieu
-0.15
imo
-0.15
942
-0.14
imler
-0.14
подÑģ
-0.14
POSITIVE LOGITS
mgr
0.16
Baxter
0.15
/antlr
0.14
leaflet
0.14
odesk
0.14
Butterfly
0.14
iggins
0.14
being
0.14
/english
0.13
spath
0.13
Activations Density 0.306%