INDEX
    Explanations

    negative phrases or indications

    New Auto-Interp
    Negative Logits
    NUMX
    -0.95
     continúas
    -0.93
    featureID
    -0.91
    хьтан
    -0.88
    RectangleBorder
    -0.85
     متعلقه
    -0.79
    ChildScrollView
    -0.79
    ^(@)
    -0.79
    НИК
    -0.78
    Portale
    -0.77
    POSITIVE LOGITS
    0.54
    =-
    0.53
     were
    0.51
    -
    0.51
    (-
    0.51
    .-
    0.49
    ,=
    0.49
    =
    0.48
     شدند
    0.47
    '
    0.47
    Act Density 0.641%

    No Known Activations