INDEX
Explanations
phrases indicating desires or intentions
expressions of desire or intention
New Auto-Interp
Negative Logits
VERTISEMENT
-0.75
semble
-0.75
eding
-0.68
ulty
-0.66
anches
-0.65
ccording
-0.65
errors
-0.64
è¦ļéĨĴ
-0.63
workers
-0.63
fell
-0.62
POSITIVE LOGITS
reprene
0.88
revenge
0.78
desperately
0.71
to
0.70
htar
0.70
permission
0.68
attention
0.68
only
0.65
lessly
0.65
nothing
0.63
Activations Density 0.078%