INDEX
Explanations
phrases indicating surprise or amazement
expressions of surprise or amazement
New Auto-Interp
Negative Logits
":[{"-0.83
href
-0.76
delinqu
-0.71
andro
-0.70
obligated
-0.69
apers
-0.67
uting
-0.65
pent
-0.65
pigeon
-0.65
commod
-0.64
POSITIVE LOGITS
zers
1.05
orld
0.99
ards
0.95
Wow
0.92
yssey
0.90
Wow
0.85
herty
0.84
wow
0.82
wow
0.81
pedia
0.81
Activations Density 0.013%