INDEX
Explanations
the phrase "you know," specifically highlighting information or personal insights
instances of repetition or affirmation in conversation
New Auto-Interp
Negative Logits
erity
-0.85
oslav
-0.82
¬¼
-0.78
omal
-0.76
Init
-0.71
mage
-0.70
oreal
-0.70
onding
-0.69
nai
-0.69
ĸļ
-0.69
POSITIVE LOGITS
uh
0.90
sir
0.68
WHERE
0.67
CLASS
0.64
Harriet
0.64
hed
0.63
LET
0.63
tick
0.62
gentlemen
0.61
WHAT
0.60
Activations Density 0.041%