INDEX
Explanations
conversational fillers or discourse markers used to express surprise, uncertainty, or hesitation
rhetorical questions and expressions of surprise or curiosity
New Auto-Interp
Negative Logits
iors
-0.69
irie
-0.68
iple
-0.68
upiter
-0.67
uci
-0.65
guiActiveUnfocused
-0.65
ument
-0.65
edom
-0.64
escription
-0.63
enaries
-0.63
POSITIVE LOGITS
huh
1.22
considering
1.17
eh
1.04
especially
0.99
especially
0.99
but
0.94
but
0.90
Especially
0.87
frankly
0.84
given
0.82
Activations Density 0.198%