INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     balls
    -0.07
    tant
    -0.07
     відповідно
    -0.07
     insan
    -0.06
    reno
    -0.06
    Modes
    -0.06
     onlar
    -0.06
    sei
    -0.06
    adioButton
    -0.06
    ificado
    -0.06
    POSITIVE LOGITS
     Ін
    0.06
     ihn
    0.06
     Among
    0.06
     tidak
    0.06
    !↵
    0.06
     unf
    0.06
    iming
    0.06
     radius
    0.06
     careful
    0.06
     CAR
    0.05
    Act Density 0.001%

    No Known Activations