INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Bul
    -0.06
    -0.06
    isters
    -0.06
     حيث
    -0.06
     OVERRIDE
    -0.06
    ابد
    -0.06
    -dominated
    -0.06
     Shr
    -0.06
     idiots
    -0.06
     gösteren
    -0.06
    POSITIVE LOGITS
     NJ
    0.07
    ↵		↵
    0.07
    	freopen
    0.06
    543
    0.06
     anal
    0.06
     я
    0.06
    小学
    0.06
    borg
    0.06
    0.06
    0.06
    Act Density 0.031%

    No Known Activations