INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ύ
    -0.24
    ucz
    -0.18
    ierz
    -0.18
    cz
    -0.17
    ocz
    -0.17
    pis
    -0.17
    ariat
    -0.17
    atz
    -0.17
    acz
    -0.16
    zs
    -0.16
    POSITIVE LOGITS
    ese
    0.30
    ise
    0.28
    ase
    0.25
    ose
    0.24
    iese
    0.24
    alse
    0.23
    sey
    0.23
    rise
    0.23
    sei
    0.23
    sea
    0.23
    Act Density 0.062%

    No Known Activations