INDEX
Explanations
personal pronouns and the word "you"
New Auto-Interp
Negative Logits
Gamb
-0.64
Filip
-0.64
Kang
-0.64
Pratt
-0.64
images
-0.63
Patt
-0.63
Kaine
-0.62
entimes
-0.62
Canaver
-0.60
Lau
-0.59
POSITIVE LOGITS
're
1.48
've
1.28
'll
1.14
RS
1.03
tub
1.01
'd
0.96
hei
0.87
re
0.85
guys
0.82
tu
0.82
Activations Density 0.193%