INDEX
Explanations
phrases expressing desires or wishes to engage in actions
New Auto-Interp
Negative Logits
ady
-0.18
aday
-0.15
á»įn
-0.15
ses
-0.14
ader
-0.14
wa
-0.14
lus
-0.14
.Fire
-0.14
alli
-0.13
PLE
-0.13
POSITIVE LOGITS
oba
0.15
اÙĪÙĩ
0.15
Sabb
0.14
.crm
0.14
nodoc
0.14
cox
0.13
ë¶
0.13
azar
0.13
atol
0.13
antom
0.13
Activations Density 0.020%