INDEX
Explanations
exclamatory words expressing strong emotions, such as surprise or disbelief
expressions of disbelief or sarcasm
New Auto-Interp
Negative Logits
arthy
-0.75
icipated
-0.71
enary
-0.71
adr
-0.69
athered
-0.64
bett
-0.63
fold
-0.63
idas
-0.62
sylv
-0.61
eded
-0.60
POSITIVE LOGITS
Seriously
0.84
zers
0.84
dunno
0.79
kidding
0.78
FTWARE
0.77
Seriously
0.75
Helpful
0.75
tho
0.73
?!
0.70
essage
0.69
Activations Density 0.055%