INDEX
Explanations
terms related to socioeconomic disadvantage or lack of resources
New Auto-Interp
Negative Logits
zug
-0.17
akedown
-0.16
oser
-0.15
raki
-0.14
ify
-0.14
Ing
-0.14
oyo
-0.13
ing
-0.13
tast
-0.13
sanitize
-0.13
POSITIVE LOGITS
served
0.23
dogs
0.22
dog
0.20
privileged
0.20
perform
0.20
erved
0.19
priv
0.19
erval
0.19
util
0.18
Perform
0.18
Activations Density 0.016%