INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     educ
    -0.07
     mountains
    -0.07
     occas
    -0.07
    vil
    -0.06
    is
    -0.06
    enk
    -0.06
     فقط
    -0.06
     potent
    -0.06
     Mandal
    -0.06
    ataka
    -0.06
    POSITIVE LOGITS
    PixelFormat
    0.06
    0.06
    cribing
    0.06
    .TextField
    0.06
     StringField
    0.06
    **
    ↵
    0.06
     Φ
    0.06
    news
    0.06
    buckets
    0.06
    .blog
    0.06
    Act Density 0.000%

    No Known Activations