INDEX
Explanations
phrases indicating surprise or disbelief
phrases expressing incredulity or emphasizing a lack of something
New Auto-Interp
Negative Logits
rend
-0.82
Aren
-0.69
_-
-0.68
ãĤ¿
-0.67
ahime
-0.64
ller
-0.64
plex
-0.63
ollen
-0.63
cel
-0.63
ilt
-0.62
POSITIVE LOGITS
remotely
1.22
bother
0.82
anymore
0.75
slightest
0.72
bothered
0.72
though
0.71
mention
0.68
tho
0.68
bothering
0.68
outright
0.65
Activations Density 0.044%