INDEX
Explanations
phrases that reference upward movement or improvement
New Auto-Interp
Negative Logits
inkel
-0.20
place
-0.17
ع
-0.16
quip
-0.16
ãĥ¬ãĥĥãĥĪ
-0.15
coni
-0.15
estr
-0.15
odore
-0.15
esters
-0.15
undler
-0.15
POSITIVE LOGITS
/down
0.21
vsp
0.16
sk
0.16
datable
0.16
.gf
0.16
dater
0.15
ois
0.15
eview
0.14
gang
0.14
744
0.14
Activations Density 0.080%