INDEX
Explanations
expressions of privilege and opportunities presented to individuals
New Auto-Interp
Negative Logits
itus
-0.16
ãģ£ãģ
-0.15
loo
-0.15
Mog
-0.15
405
-0.15
-scalable
-0.14
perator
-0.14
rå
-0.14
âī¡
-0.14
ÑħодиÑĤÑĮ
-0.13
POSITIVE LOGITS
privilege
0.73
pleasure
0.65
priv
0.61
priv
0.60
prive
0.58
Priv
0.56
Priv
0.51
PRIV
0.51
Ple
0.50
ple
0.49
Activations Density 0.086%