INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
    ,x
    -0.06
     scholars
    -0.06
    =color
    -0.06
     leaks
    -0.06
     animator
    -0.06
    Congratulations
    -0.06
     ژانویه
    -0.06
     finder
    -0.06
     pensar
    -0.06
    POSITIVE LOGITS
    [attr
    0.08
    ‌تر
    0.07
    TintColor
    0.07
    s
    0.07
     وضعیت
    0.06
    unction
    0.06
    ah
    0.06
    0.06
    мот
    0.06
    gal
    0.06
    Act Density 0.005%

    No Known Activations