INDEX
    Explanations

    code, github actions

    New Auto-Interp
    Negative Logits
    -cert
    -0.07
    ад
    -0.07
    setTitle
    -0.06
    Buffers
    -0.06
    equ
    -0.06
    bolt
    -0.06
     мех
    -0.06
    -0.06
    »:
    -0.06
    zeros
    -0.06
    POSITIVE LOGITS
     sach
    0.07
     frequency
    0.06
     Byron
    0.06
    ÜM
    0.06
     personalized
    0.06
     Wellington
    0.06
     Mustafa
    0.06
     nakonec
    0.06
     seule
    0.06
    비스
    0.06
    Act Density 0.002%

    No Known Activations