INDEX
Explanations
phrases indicating expectation or lack thereof
phrases that indicate a lack of surprise or unexpectedness
New Auto-Interp
Negative Logits
nels
-0.72
Logged
-0.67
abulary
-0.61
audi
-0.60
nis
-0.59
Suns
-0.59
tery
-0.59
gou
-0.58
heastern
-0.56
ouf
-0.56
POSITIVE LOGITS
surprise
0.82
actionDate
0.73
shock
0.70
ENSE
0.69
ream
0.65
rection
0.64
azes
0.64
ratulations
0.63
though
0.63
card
0.62
Activations Density 0.068%