INDEX
Explanations
pronouns referring to the second person, particularly "you" and its variations
New Auto-Interp
Negative Logits
cannot
-0.24
cannot
-0.21
DON
-0.20
ANNOT
-0.19
don
-0.19
are
-0.18
aren
-0.18
don
-0.18
shouldn
-0.16
ãģĵãģ¨ãģ¯
-0.15
POSITIVE LOGITS
guys
0.31
ever
0.31
ever
0.22
Guys
0.21
Ever
0.21
/she
0.19
suppose
0.19
nÄĽkdy
0.19
still
0.18
EVER
0.18
Activations Density 0.078%