INDEX
Explanations
conversational markers indicating uncertainty or interactivity in dialogue
New Auto-Interp
Negative Logits
yn
-0.15
olet
-0.15
pher
-0.15
ÌĨ
-0.14
gnore
-0.14
velt
-0.14
sez
-0.14
zel
-0.14
opolitan
-0.13
rrha
-0.13
POSITIVE LOGITS
Briggs
0.15
owo
0.14
под
0.13
tod
0.13
ibri
0.13
åºı
0.13
Downs
0.13
asl
0.13
Roth
0.13
[s
0.13
Activations Density 0.161%