INDEX
Explanations
phrases related to surprise or revelation
statements related to surprising or unexpected information and their phrasing
New Auto-Interp
Negative Logits
ascript
-0.68
ourses
-0.66
sequently
-0.65
ilial
-0.64
rans
-0.62
[+
-0.61
[_
-0.61
azel
-0.61
Applications
-0.58
emort
-0.57
POSITIVE LOGITS
kidding
0.89
nerds
0.77
damned
0.75
darn
0.75
understatement
0.75
Cheap
0.73
steroids
0.72
hats
0.71
Dirty
0.68
classy
0.68
Activations Density 1.991%