INDEX
Explanations
references to the pronoun "you."
New Auto-Interp
Negative Logits
eer
-0.19
agas
-0.16
↵↵
-0.16
aler
-0.15
Awareness
-0.15
gnore
-0.15
باÙĨ
-0.14
оиÑĤ
-0.14
ango
-0.14
Knowledge
-0.14
POSITIVE LOGITS
know
0.37
know
0.29
Know
0.25
Know
0.24
knows
0.20
KN
0.18
çŁ¥éģĵ
0.17
KNOW
0.16
çŁ¥
0.16
mentioned
0.15
Activations Density 0.028%