INDEX
Explanations
phrases related to personal identity and characteristics
the presence of the word "a" and variations related to identity and roles
New Auto-Interp
Negative Logits
scenes
-0.82
ernels
-0.75
Ö¼
-0.75
breaks
-0.72
ourses
-0.71
appointments
-0.69
uden
-0.68
views
-0.67
books
-0.67
execute
-0.67
POSITIVE LOGITS
hypocr
1.09
spectator
1.05
member
1.04
prostitute
0.98
follower
0.98
virgin
0.98
participant
0.98
citizen
0.98
believer
0.97
bystand
0.97
Activations Density 0.139%