INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (left
    -0.07
     fasting
    -0.07
     emlrt
    -0.07
     Freund
    -0.07
    YD
    -0.06
    writing
    -0.06
     محمد
    -0.06
     Patriots
    -0.06
     dân
    -0.06
    tea
    -0.06
    POSITIVE LOGITS
     amid
    0.07
     agricultural
    0.07
     circuit
    0.07
     nút
    0.06
     glossy
    0.06
     fiz
    0.06
    izin
    0.06
     Gig
    0.06
     Circuit
    0.06
    ;i
    0.06
    Act Density 0.001%

    No Known Activations