INDEX
Explanations
mentions of privilege, especially in social contexts
New Auto-Interp
Negative Logits
tra
-0.83
ood
-0.77
agh
-0.74
ICA
-0.74
thumbnails
-0.71
urgy
-0.70
rum
-0.69
enegger
-0.68
NCT
-0.67
hani
-0.66
POSITIVE LOGITS
ilege
1.39
ileged
1.21
privilege
1.11
Priv
1.01
privileges
0.85
holders
0.84
privileged
0.79
Priv
0.77
SPONSORED
0.76
afforded
0.75
Activations Density 0.062%