INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Италијани
    -1.03
    Tikang
    -0.80
     autorytatywna
    -0.80
    Diweddarwch
    -0.76
    exitRule
    -0.75
    RenderAtEndOf
    -0.71
     ویکی‌پدی
    -0.71
    verifyException
    -0.70
    PreInfinity
    -0.69
    featureID
    -0.69
    POSITIVE LOGITS
    net
    0.38
    http
    0.35
    raborty
    0.35
    {}{}
    0.35
     نی
    0.35
     website
    0.34
     Bastard
    0.33
     inilah
    0.33
     דבר
    0.33
    vergleich
    0.32
    Act Density 0.116%

    No Known Activations