INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     scl
    -0.06
    burger
    -0.06
    -0.06
     trois
    -0.06
     ugly
    -0.06
    ynchronize
    -0.06
    impl
    -0.06
     ج
    -0.06
    forall
    -0.06
    -0.06
    POSITIVE LOGITS
     turnaround
    0.07
    enti
    0.07
     перева
    0.06
    ENTE
    0.06
     ferry
    0.06
    previous
    0.06
     paras
    0.06
    yan
    0.06
    ęż
    0.06
     CAT
    0.06
    Act Density 0.063%

    No Known Activations