INDEX
Explanations
themes related to challenging social norms and barriers
New Auto-Interp
Negative Logits
638
-0.16
à¤Ĩत
-0.15
ysl
-0.15
Nobel
-0.15
جÙĨ
-0.13
Hands
-0.13
ìļķ
-0.13
igin
-0.13
iÅŁim
-0.13
Brennan
-0.12
POSITIVE LOGITS
convention
0.40
conventions
0.36
conventional
0.34
norms
0.34
established
0.32
accepted
0.31
conformity
0.30
orth
0.29
Convention
0.28
expectations
0.28
Activations Density 0.241%