INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fi
    -0.09
     disagreements
    -0.09
    -0.08
    [ii
    -0.08
     Odd
    -0.08
     Mrs
    -0.08
     Theft
    -0.08
    .Exceptions
    -0.07
     fiel
    -0.07
     Ste
    -0.07
    POSITIVE LOGITS
     rotates
    0.09
    awd
    0.08
    userid
    0.08
    seo
    0.08
    Rot
    0.08
    isht
    0.08
    fb
    0.07
    andbox
    0.07
     rotating
    0.07
     gereken
    0.07
    Act Density 0.004%

    No Known Activations