INDEX
Explanations
emotional adjectives expressing sensations and feelings
New Auto-Interp
Negative Logits
zan
-0.21
inho
-0.16
tings
-0.16
/remove
-0.15
/write
-0.15
functioning
-0.15
ceed
-0.15
sar
-0.15
/delete
-0.15
fittings
-0.14
POSITIVE LOGITS
ly
0.59
LY
0.36
ÑģÑı
0.29
ingly
0.28
äºİ
0.20
redients
0.20
lys
0.19
redient
0.18
ãĤĪãģĨãģª
0.18
lya
0.17
Activations Density 0.105%