INDEX
Explanations
specific intentions or desires expressed by the speaker
expressions of desire or wishes
New Auto-Interp
Negative Logits
erver
-0.72
manship
-0.72
iop
-0.70
cript
-0.69
cheat
-0.68
mir
-0.67
lish
-0.66
wikipedia
-0.65
eding
-0.65
mans
-0.65
POSITIVE LOGITS
reprene
0.70
rison
0.67
warr
0.66
ãĤ¦ãĤ¹
0.65
inspiration
0.65
urities
0.63
permission
0.62
ipal
0.62
nesday
0.61
revenge
0.60
Activations Density 0.061%