INDEX
Explanations
phrases indicating upward movement or improvement
New Auto-Interp
Negative Logits
ories
-0.17
adecimal
-0.17
anges
-0.17
üstü
-0.17
avier
-0.16
depend
-0.16
enticated
-0.16
ftware
-0.16
voj
-0.16
alyzed
-0.15
POSITIVE LOGITS
sur
0.25
root
0.24
shot
0.23
rightness
0.20
draft
0.20
start
0.20
otre
0.19
standing
0.19
sert
0.19
dat
0.19
Activations Density 0.033%