INDEX
Explanations
discussions of social conformity and the pressures of societal expectations
New Auto-Interp
Negative Logits
rana
-0.18
'gc
-0.16
ysl
-0.15
ocities
-0.14
å¼±
-0.14
enberg
-0.13
OffsetTable
-0.13
zych
-0.13
Ïħ
-0.13
registry
-0.13
POSITIVE LOGITS
conformity
0.32
confines
0.30
box
0.28
conform
0.26
constraints
0.26
-box
0.26
BOX
0.26
rigid
0.25
confinement
0.25
pigeon
0.25
Activations Density 0.296%