INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Jewish
    -0.08
    ewish
    -0.07
     athlete
    -0.07
    /qu
    -0.07
    uem
    -0.06
    *sin
    -0.06
    -0.06
     hero
    -0.06
    uw
    -0.06
     Seen
    -0.06
    POSITIVE LOGITS
    .groupControl
    0.07
    .soft
    0.07
     itir
    0.06
    .textLabel
    0.06
    _Resource
    0.06
    barcode
    0.06
    ssf
    0.06
     Aberdeen
    0.06
     '".$_
    0.06
     داشتند
    0.06
    Act Density 0.042%

    No Known Activations