INDEX
Explanations
phrases expressing desire or preference
New Auto-Interp
Negative Logits
Extras
-0.17
oud
-0.15
912
-0.14
gles
-0.14
nga
-0.14
ady
-0.14
کت
-0.14
rapper
-0.14
sein
-0.13
adu
-0.13
POSITIVE LOGITS
igr
0.18
permission
0.15
ardon
0.15
assistance
0.15
бÑĥдÑĮ
0.14
gnore
0.14
alta
0.14
æ²¢
0.14
quam
0.14
ffective
0.14
Activations Density 0.027%