INDEX
Explanations
high-frequency repetitive sequences or patterns in the text
New Auto-Interp
Negative Logits
ses
-0.15
ish
-0.15
sl
-0.14
shot
-0.13
ceae
-0.13
shit
-0.13
iker
-0.12
ook
-0.12
ore
-0.12
vel
-0.12
POSITIVE LOGITS
页éĿ¢åŃĺæ¡£å¤ĩ份
0.20
ythe
0.17
forth
0.16
ichten
0.15
arity
0.14
yth
0.14
ancel
0.14
ÙĬÙĦاد
0.14
gel
0.14
icont
0.14
Activations Density 0.053%