INDEX
Explanations
words and phrases related to habits
New Auto-Interp
Negative Logits
zeÅĦ
-0.17
cept
-0.17
ngthen
-0.16
аÑĢÑħ
-0.14
479
-0.14
elson
-0.14
onz
-0.14
gif
-0.14
eso
-0.14
flamm
-0.14
POSITIVE LOGITS
ually
0.16
hin
0.15
ally
0.15
-alist
0.15
rov
0.15
ense
0.15
TEGER
0.14
ogui
0.14
rière
0.14
Ā
0.13
Activations Density 0.009%