INDEX
Explanations
phrases highlighting education, social class expectations, and societal norms regarding relationships
New Auto-Interp
Negative Logits
otherwise
-0.17
overall
-0.16
crucial
-0.15
critical
-0.14
ibli
-0.14
sourcing
-0.14
hlen
-0.14
key
-0.14
penn
-0.14
ãģłãģijãģ§
-0.14
POSITIVE LOGITS
modern
0.26
commercial
0.25
-commercial
0.24
modern
0.23
Modern
0.22
Modern
0.21
moderne
0.21
comercial
0.21
commercial
0.20
ÑģÑĥÑĩаÑģ
0.20
Activations Density 0.025%