INDEX
Explanations
personal identity and classification statements
repeated patterns or affirmations of identity, particularly related to being a woman and racial identity
New Auto-Interp
Negative Logits
snacks
-0.77
fortun
-0.71
publicity
-0.70
seiz
-0.69
çīĪ
-0.67
telesc
-0.63
ACS
-0.63
Circus
-0.63
srfAttach
-0.63
ãĥ¼ãĥĨ
-0.63
POSITIVE LOGITS
Ļ
1.24
¤
1.21
ª
1.15
¬
1.13
£
1.08
Ĵ
1.07
ħ
1.06
ĸ
1.05
Ķ
1.05
¼
1.02
Activations Density 0.200%