INDEX
Explanations
themes related to overcrowding and confinement
New Auto-Interp
Negative Logits
ège
-0.15
leftright
-0.14
ÅĤe
-0.14
游
-0.13
berger
-0.13
streak
-0.13
ount
-0.13
Dün
-0.13
isol
-0.13
ÅĤo
-0.13
POSITIVE LOGITS
packed
0.50
compressed
0.48
packed
0.47
-packed
0.44
compression
0.43
crow
0.42
squeezed
0.41
squeeze
0.41
jam
0.41
compressed
0.40
Activations Density 0.311%