INDEX
Explanations
expressions like sparkle or light
New Auto-Interp
Negative Logits
beyond
-0.09
prim
-0.09
familiar
-0.09
ilo
-0.09
pride
-0.09
vara
-0.08
igo
-0.08
Pig
-0.08
Wenger
-0.08
f
-0.08
POSITIVE LOGITS
èĥ½
0.12
èĥ½
0.12
able
0.12
èĥ½å¤Ł
0.10
поба
0.10
command
0.10
pier
0.10
seeming
0.10
riv
0.09
095
0.09
Activations Density 0.055%