INDEX
Explanations
the word "wanna" at various activation levels
expressions of desire or intention to take action
New Auto-Interp
Negative Logits
士
-0.86
VERTISEMENT
-0.84
advertisement
-0.83
arian
-0.76
loo
-0.74
lain
-0.72
idem
-0.71
sequ
-0.70
edience
-0.69
ochond
-0.68
POSITIVE LOGITS
wanna
1.29
ignt
0.80
nab
0.75
pping
0.74
pload
0.71
aspire
0.70
listen
0.70
hear
0.69
reprene
0.69
ya
0.68
Activations Density 0.005%