INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     галуз
    -0.08
     eighth
    -0.07
     เห
    -0.07
    bbie
    -0.07
    ,title
    -0.06
    _tt
    -0.06
    .loc
    -0.06
     tenth
    -0.06
    First
    -0.06
    Ve
    -0.06
    POSITIVE LOGITS
     SHOP
    0.06
     unrestricted
    0.06
    pons
    0.06
    _cent
    0.06
     eget
    0.06
    '),
    ↵
    0.05
     влас
    0.05
     pitching
    0.05
     Lug
    0.05
     önüne
    0.05
    Act Density 0.024%

    No Known Activations