INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     wage
    -0.07
    sequ
    -0.07
    �断
    -0.07
     sağlay
    -0.06
     Mushroom
    -0.06
    forming
    -0.06
     shoes
    -0.06
    _clone
    -0.06
    .ADMIN
    -0.06
     brushed
    -0.06
    POSITIVE LOGITS
     همکاری
    0.07
     tones
    0.06
    executable
    0.06
    artisanlib
    0.06
    -chair
    0.06
    .WebDriver
    0.06
    =C
    0.06
     aspiring
    0.06
    ня
    0.06
    алась
    0.06
    Act Density 0.023%

    No Known Activations