INDEX
Explanations
references to social status and inequality
New Auto-Interp
Negative Logits
高端
-0.44
thansa
-0.43
mewah
-0.43
zydent
-0.43
adpleegd
-0.43
ACHUSETTS
-0.41
Adults
-0.40
stately
-0.40
奏
-0.40
advanced
-0.40
POSITIVE LOGITS
humble
0.86
lowly
0.86
cheap
0.85
humb
0.85
poorer
0.79
cheap
0.76
cheaper
0.73
Cheap
0.72
inexpensive
0.71
plebe
0.70
Activations Density 0.642%