INDEX
Explanations
references to the audience or reader in a persuasive context
New Auto-Interp
Negative Logits
itself
-0.27
themselves
-0.26
ly
-0.17
mond
-0.16
usted
-0.15
Clr
-0.15
pei
-0.15
icle
-0.14
l
-0.14
onaut
-0.14
POSITIVE LOGITS
/us
0.39
SELF
0.33
’re
0.32
guys
0.32
-même
0.31
/her
0.28
're
0.27
nger
0.27
/me
0.26
yourself
0.26
Activations Density 0.115%