INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
     TOUR
    -0.08
    -0.07
     поперед
    -0.07
     Labour
    -0.07
     filho
    -0.07
    _Offset
    -0.07
    ählt
    -0.07
     SALE
    -0.07
     potassium
    -0.07
    POSITIVE LOGITS
     зд
    0.07
    0.07
    416
    0.06
     Remain
    0.06
     glad
    0.06
    asdf
    0.06
     YY
    0.05
     дот
    0.05
    „ظ
    0.05
     
    ↵
    ↵
    0.05
    Act Density 0.002%

    No Known Activations