INDEX
Explanations
expressions indicating surprise or unexpected outcomes
phrases indicating the expectation of surprise or the state of being taken aback
New Auto-Interp
Negative Logits
ebted
-0.71
Logged
-0.69
neys
-0.61
oller
-0.58
ney
-0.58
them
-0.57
ter
-0.56
Attributes
-0.55
tery
-0.55
filibuster
-0.54
POSITIVE LOGITS
unsur
0.81
no
0.72
pires
0.72
follows
0.70
surprise
0.68
pired
0.68
shock
0.68
ynchron
0.67
unwelcome
0.67
far
0.67
Activations Density 0.039%