INDEX
    Explanations

    themes related to overcrowding and confinement

    New Auto-Interp
    Negative Logits
    ège
    -0.15
    leftright
    -0.14
    ÅĤe
    -0.14
    游
    -0.13
    berger
    -0.13
     streak
    -0.13
    ount
    -0.13
     Dün
    -0.13
     isol
    -0.13
    ÅĤo
    -0.13
    POSITIVE LOGITS
     packed
    0.50
     compressed
    0.48
    packed
    0.47
    -packed
    0.44
     compression
    0.43
     crow
    0.42
     squeezed
    0.41
     squeeze
    0.41
     jam
    0.41
    compressed
    0.40
    Act Density 0.311%

    No Known Activations