INDEX
Explanations
questions directed at the reader
instances of the pronoun "you" in various contexts
New Auto-Interp
Negative Logits
ruciating
-0.79
accompan
-0.75
ospital
-0.73
umber
-0.70
inelli
-0.69
apo
-0.69
ges
-0.66
Lago
-0.66
arc
-0.65
"},
-0.64
POSITIVE LOGITS
guys
1.11
ever
1.03
?'
0.92
subscribed
0.89
intend
0.89
tub
0.87
?'"
0.87
prefer
0.87
?"
0.86
wanna
0.85
Activations Density 0.041%