INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    education
    -0.07
    تعامل
    -0.07
    -0.07
     distur
    -0.06
     headlights
    -0.06
    -0.06
     EDUC
    -0.06
     Studies
    -0.06
    .Tasks
    -0.06
    спект
    -0.06
    POSITIVE LOGITS
    ."'
    0.07
    ומים
    0.07
    INET
    0.07
     '"'
    0.07
    (fh
    0.07
     WN
    0.07
    olygon
    0.07
     Yo
    0.07
     Jays
    0.06
    itories
    0.06
    Act Density 0.006%

    No Known Activations