INDEX
Explanations
expressions of simplicity versus complexity
New Auto-Interp
Negative Logits
rend
-0.20
anik
-0.15
ilm
-0.14
uja
-0.14
lef
-0.14
Alphabet
-0.14
oden
-0.14
opot
-0.13
Ĥæķ°
-0.13
airo
-0.13
POSITIVE LOGITS
simple
0.33
Simple
0.31
simple
0.30
-simple
0.28
simplicity
0.28
ç®Ģåįķ
0.27
simples
0.27
SIMPLE
0.27
Simple
0.27
_simple
0.23
Activations Density 0.155%