INDEX
Explanations
phrases used to provide clarification or emphasize a point
the expression of disbelief or emphasis in statements
New Auto-Interp
Negative Logits
"},"
-0.72
onding
-0.65
ittered
-0.64
-+-+
-0.63
corrid
-0.63
affe
-0.61
iets
-0.59
appers
-0.58
icipated
-0.57
iated
-0.56
POSITIVE LOGITS
seriously
1.08
honestly
1.00
REALLY
0.99
yeah
0.98
yea
0.90
LOOK
0.86
literally
0.84
Seriously
0.83
wow
0.82
sure
0.81
Activations Density 0.022%