INDEX
Explanations
phrases encouraging communication and outreach
New Auto-Interp
Negative Logits
FSIZE
-0.17
emmel
-0.15
allon
-0.14
λή
-0.14
enheim
-0.14
庫
-0.14
rema
-0.14
ocht
-0.14
_DF
-0.14
íĿ
-0.14
POSITIVE LOGITS
free
0.55
free
0.40
-free
0.35
_free
0.35
free
0.33
Ñģвобод
0.33
Free
0.32
Free
0.32
FREE
0.31
welcome
0.30
Activations Density 0.015%