INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     BJ
    -0.07
     helper
    -0.07
     density
    -0.06
     secretary
    -0.06
    give
    -0.06
     slope
    -0.06
     shocks
    -0.06
     Igor
    -0.06
    profession
    -0.06
     Luz
    -0.06
    POSITIVE LOGITS
    oupper
    0.06
     الذه
    0.06
    ála
    0.06
     infos
    0.06
    nero
    0.06
     â
    0.06
    (火
    0.06
    _WORK
    0.06
    โรงเร
    0.06
    (EXIT
    0.06
    Act Density 0.057%

    No Known Activations