INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     manipulated
    -0.07
    -schema
    -0.06
     مى
    -0.06
     evolving
    -0.06
     fertilizer
    -0.06
     plateau
    -0.06
     جا
    -0.06
    _checkout
    -0.06
     distractions
    -0.06
     deck
    -0.06
    POSITIVE LOGITS
     or
    0.08
    _wrong
    0.07
    .Syntax
    0.06
     Sex
    0.06
     Од
    0.06
    افت
    0.06
    отор
    0.06
    اخ
    0.06
    Од
    0.06
    Initialized
    0.06
    Act Density 0.064%

    No Known Activations