INDEX
Explanations
phrases indicating perception or observation of situations or conditions
New Auto-Interp
Negative Logits
ultipart
-0.16
oni
-0.16
LOSE
-0.15
ori
-0.15
onis
-0.14
Fade
-0.14
yat
-0.14
fact
-0.14
itor
-0.14
affen
-0.14
POSITIVE LOGITS
äºĨä¸Ģ
0.14
pyx
0.14
Boulder
0.14
leck
0.14
cwd
0.13
ãĤīãģĽ
0.13
edeki
0.13
anst
0.13
pte
0.13
differently
0.13
Activations Density 0.050%