INDEX
Explanations
statements indicating surprise or unexpected information
phrases that convey surprising information or revelations
New Auto-Interp
Negative Logits
advoc
-0.67
weaving
-0.66
predicate
-0.65
vend
-0.64
unrestricted
-0.64
conduit
-0.62
alach
-0.62
guarant
-0.62
odynam
-0.62
bluff
-0.61
POSITIVE LOGITS
surprise
0.71
etheless
0.69
merce
0.68
shock
0.64
Shiny
0.63
3000
0.62
surprises
0.62
143
0.61
whel
0.61
OUS
0.60
Activations Density 0.193%