INDEX
Explanations
phrases indicating intention or desire to engage in actions
New Auto-Interp
Negative Logits
usal
-0.18
acias
-0.14
hiro
-0.14
elah
-0.14
UMMY
-0.14
resi
-0.13
ãĢħ
-0.13
éī
-0.13
sono
-0.13
ä¼į
-0.13
POSITIVE LOGITS
@class
0.16
anz
0.15
497
0.15
onomy
0.15
afari
0.14
ÃĹ↵↵
0.14
iske
0.14
ekl
0.14
xr
0.13
nton
0.13
Activations Density 0.087%