INDEX
Explanations
text indicating a prompt to verify that the reader is not a robot
references to the word "you" in various contexts
New Auto-Interp
Negative Logits
Canaver
-0.91
Kang
-0.67
Aberdeen
-0.66
Wonderland
-0.66
Pratt
-0.65
Cater
-0.64
achable
-0.63
Lau
-0.62
Defenders
-0.62
acular
-0.62
POSITIVE LOGITS
're
1.33
've
1.13
'll
1.06
RS
1.02
hei
0.92
guys
0.87
know
0.85
'd
0.83
can
0.83
tub
0.82
Activations Density 0.136%