INDEX
Explanations
роф. I am choosing this because there is a repetition of this token in MAX_ACTIVATING_TOKENS
New Auto-Interp
Negative Logits
charms
-0.09
cardstock
-0.08
charm
-0.08
codecs
-0.08
volt
-0.08
Libert
-0.08
χει
-0.08
zak
-0.07
airborne
-0.07
ballistic
-0.07
POSITIVE LOGITS
very
0.07
compartment
0.07
-Val
0.07
healthy
0.07
ef
0.07
407
0.07
esp
0.07
don
0.07
quinas
0.07
.Full
0.07
Activations Density 0.001%