INDEX
Explanations
personal pronouns combined with verbs
the presence of the word "you" in various contexts
New Auto-Interp
Negative Logits
forth
-0.82
Cornwall
-0.69
Gamb
-0.68
MacArthur
-0.67
Canaver
-0.66
ipal
-0.64
Gaw
-0.63
Filip
-0.62
Ferdinand
-0.62
Aberdeen
-0.61
POSITIVE LOGITS
're
1.31
've
1.18
tub
1.12
'll
1.08
tu
1.05
guys
0.97
'd
0.95
RS
0.93
wanna
0.89
guessed
0.84
Activations Density 0.087%