INDEX
Explanations
dialogue or direct speech with quotation marks
direct quotations or speech within the text
New Auto-Interp
Negative Logits
avorite
-0.77
triv
-0.65
penal
-0.65
guest
-0.65
notoriously
-0.64
weakened
-0.63
foreground
-0.63
affected
-0.63
prone
-0.62
âĹ¼
-0.62
POSITIVE LOGITS
Oh
0.93
Hey
0.92
cow
0.91
Jesus
0.91
hey
0.90
I
0.89
nothing
0.83
Nothing
0.81
Fuck
0.81
Bring
0.80
Activations Density 0.086%