INDEX
Explanations
phrases starting with quotations
instances of direct speech
New Auto-Interp
Negative Logits
princip
-0.85
flared
-0.81
deterrent
-0.77
bod
-0.72
cheek
-0.72
clut
-0.72
arri
-0.71
conve
-0.71
adv
-0.70
sund
-0.70
POSITIVE LOGITS
Hey
1.54
hey
1.41
Oh
1.39
hello
1.34
Damn
1.25
why
1.25
Fuck
1.25
Look
1.24
Why
1.23
Okay
1.23
Activations Density 0.049%