INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Ґ
    -0.07
    Paper
    -0.06
    arov
    -0.06
    irebase
    -0.06
    Cooldown
    -0.06
    ملة
    -0.06
    верд
    -0.06
    BOUND
    -0.06
     bekommen
    -0.06
    RH
    -0.06
    POSITIVE LOGITS
     steer
    0.07
    ('\\
    0.07
     이후
    0.07
     technician
    0.07
    ./(
    0.06
    .Close
    0.06
     part
    0.06
     rn
    0.06
     ölçü
    0.06
     invoices
    0.06
    Act Density 0.009%

    No Known Activations