INDEX
Explanations
repeated special characters or icons in the text
New Auto-Interp
Negative Logits
aliz
-0.15
afil
-0.14
olie
-0.14
experience
-0.14
|h
-0.14
alian
-0.14
Shuffle
-0.13
ết
-0.13
izza
-0.13
toy
-0.13
POSITIVE LOGITS
оген
0.19
San
0.15
Heb
0.15
Texas
0.15
.Constraint
0.15
Hel
0.14
PE
0.14
TEX
0.14
ogen
0.14
Hero
0.14
Activations Density 0.010%