INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Simpsons
    -0.08
     lyric
    -0.07
     برگ
    -0.07
    _cond
    -0.06
    -0.06
     Priority
    -0.06
     ;;^
    -0.06
    Ş
    -0.06
    owie
    -0.05
    ّر
    -0.05
    POSITIVE LOGITS
    _sync
    0.07
     nv
    0.07
     Россий
    0.06
    023
    0.06
     NSMutable
    0.06
     أيضا
    0.06
    )은
    0.06
     twitter
    0.06
     hợp
    0.06
     постоянно
    0.06
    Act Density 0.031%

    No Known Activations