INDEX
Explanations
concepts related to social class dynamics and inequality
New Auto-Interp
Negative Logits
ervo
-0.14
Modular
-0.14
allo
-0.14
utf
-0.14
ATTRIBUTE
-0.14
legend
-0.14
zo
-0.14
pÅĻip
-0.14
xad
-0.14
enery
-0.14
POSITIVE LOGITS
nob
0.41
upper
0.37
arist
0.36
landed
0.35
noble
0.34
nob
0.33
upper
0.30
elite
0.30
-upper
0.30
Nob
0.28
Activations Density 0.237%