INDEX
Explanations
words related to social status and privilege
references to privilege and socioeconomic status
New Auto-Interp
Negative Logits
ood
-0.78
leaf
-0.77
hur
-0.74
lust
-0.74
ope
-0.73
soDeliveryDate
-0.72
repl
-0.72
hun
-0.71
hang
-0.70
hound
-0.70
POSITIVE LOGITS
ileged
1.20
privileged
0.98
ilege
0.92
adolesc
0.81
citiz
0.81
uitous
0.77
dinand
0.71
privilege
0.71
upbringing
0.71
Guests
0.71
Activations Density 0.018%