INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    /pass
    -0.07
     Masc
    -0.07
    .nc
    -0.07
    Pass
    -0.07
    بية
    -0.06
     Grupo
    -0.06
     Exec
    -0.06
     مردم
    -0.06
    burn
    -0.06
     abril
    -0.06
    POSITIVE LOGITS
     сви
    0.06
    0.06
     Kramer
    0.06
     gruesome
    0.06
     snug
    0.06
     spolup
    0.06
    .debugLine
    0.06
    .contentView
    0.06
    hm
    0.06
    cki
    0.05
    Act Density 0.013%

    No Known Activations