INDEX
Explanations
comparisons that highlight inequality or significant issues in society
New Auto-Interp
Negative Logits
.mdl
-0.15
inci
-0.15
ubic
-0.14
atron
-0.14
onio
-0.14
flip
-0.14
erah
-0.14
меÑĪ
-0.14
inel
-0.14
å£
-0.13
POSITIVE LOGITS
er
0.15
ãģ£ãģ¡
0.14
acman
0.14
æľŁ
0.14
rec
0.14
ipro
0.14
professions
0.14
sons
0.14
اÙĨج
0.14
ther
0.13
Activations Density 0.161%