INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bpy
    -0.07
    :I
    -0.07
    (employee
    -0.07
     framing
    -0.07
     سفید
    -0.06
     tide
    -0.06
     Pier
    -0.06
    -five
    -0.06
    '),'
    -0.06
     filled
    -0.06
    POSITIVE LOGITS
     nuts
    0.08
    0.07
    phet
    0.07
    чие
    0.07
    CLK
    0.07
    َك
    0.07
    χ
    0.07
    uck
    0.07
     Turbo
    0.06
    kh
    0.06
    Act Density 0.005%

    No Known Activations