INDEX
    Explanations

    ignoring/removing lines

    New Auto-Interp
    Negative Logits
    Senator
    -0.08
    Ann
    -0.07
    Battery
    -0.07
     executives
    -0.07
     Chancellor
    -0.07
    -0.07
    Pid
    -0.07
    話を
    -0.06
    President
    -0.06
    Sn
    -0.06
    POSITIVE LOGITS
     כג
    0.08
     compartir
    0.07
     JOB
    0.07
     incorrect
    0.07
    0.07
    丛林
    0.07
    .DATA
    0.06
    星座
    0.06
     ואז
    0.06
    /shared
    0.06
    Act Density 0.032%

    No Known Activations