INDEX
Explanations
phrases related to uniformity and conformity
concepts related to conformity and societal expectations
New Auto-Interp
Negative Logits
recons
-0.64
ridor
-0.63
indeed
-0.62
berus
-0.61
alde
-0.61
cedented
-0.60
ESE
-0.60
conclud
-0.60
imar
-0.59
rieve
-0.59
POSITIVE LOGITS
shitty
1.22
crappy
1.17
crap
1.08
mediocre
1.00
shit
0.97
boring
0.88
guy
0.87
fucking
0.86
paycheck
0.85
bullshit
0.84
Activations Density 0.791%