INDEX
Explanations
instances of dialogue or direct speech
New Auto-Interp
Negative Logits
Oops
-0.18
icher
-0.14
omen
-0.14
_HINT
-0.13
Hoover
-0.13
ëį°ìĿ´íĬ¸
-0.13
Damn
-0.13
lickr
-0.13
Funny
-0.13
lÃŃ
-0.13
POSITIVE LOGITS
agree
0.23
amen
0.23
Amen
0.22
agreed
0.21
agree
0.21
amen
0.19
/ag
0.19
Agree
0.19
agrees
0.17
exactly
0.16
Activations Density 0.134%