INDEX
Explanations
expressions of admiration or surprise, typically starting with "Wow"
expressions of surprise or amazement
New Auto-Interp
Negative Logits
href
-0.76
delinqu
-0.72
redress
-0.68
uting
-0.68
obligated
-0.67
externalToEVAOnly
-0.66
apers
-0.65
ãĥ´
-0.64
pige
-0.63
rive
-0.63
POSITIVE LOGITS
zers
1.15
Wow
0.94
wow
0.91
orld
0.88
pedia
0.88
wow
0.86
ards
0.80
yssey
0.79
!,
0.76
Wow
0.75
Activations Density 0.021%