INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     radioButton
    -0.07
    rea
    -0.07
     category
    -0.07
     ratios
    -0.07
     vulgar
    -0.07
     jogging
    -0.07
     Paula
    -0.07
     anomaly
    -0.06
    auss
    -0.06
     Ade
    -0.06
    POSITIVE LOGITS
     Ship
    0.08
     ships
    0.07
    ك
    0.07
    0.07
    zip
    0.07
    Ship
    0.07
     ship
    0.07
    ап
    0.07
     Ships
    0.07
    ipped
    0.07
    Act Density 0.013%

    No Known Activations