INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     watching
    -0.09
     bowling
    -0.09
     xổ
    -0.09
     Lein
    -0.09
    博彩
    -0.08
     brownie
    -0.08
    োল
    -0.08
     smarty
    -0.08
     rdr
    -0.08
     érz
    -0.08
    POSITIVE LOGITS
    paths
    0.09
     cables
    0.08
    heil
    0.08
     highways
    0.08
     हिम
    0.08
     dài
    0.07
     Hans
    0.07
    HE
    0.07
     paths
    0.07
     Carrier
    0.07
    Act Density 0.003%

    No Known Activations