INDEX
Explanations
words related to obstacles or restrictions
New Auto-Interp
Negative Logits
IRECTION
-0.15
.toolbox
-0.15
ony
-0.15
venes
-0.14
íĺ¸
-0.14
SCO
-0.13
bab
-0.13
ãģ¹ãģ¦
-0.13
appa
-0.13
allet
-0.13
POSITIVE LOGITS
/block
0.19
peq
0.16
-free
0.15
-Free
0.15
/bar
0.14
chief
0.14
akhir
0.14
ocracy
0.14
:block
0.14
upakan
0.14
Activations Density 0.115%