INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     omissions
    -0.09
    -0.08
     cheeses
    -0.08
     seroton
    -0.07
     conscient
    -0.07
     Braz
    -0.07
     sham
    -0.07
     họ
    -0.07
     нап
    -0.07
    Amounts
    -0.07
    POSITIVE LOGITS
     slope
    0.10
    Slope
    0.09
     slopes
    0.09
     καν
    0.08
    lope
    0.08
    =-
    0.07
     coef
    0.07
     NSNumber
    0.07
    čný
    0.07
    =");↵
    0.07
    Act Density 0.025%

    No Known Activations