INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     imagination
    -0.07
     lobbying
    -0.07
     й
    -0.06
     Jarvis
    -0.06
     BEL
    -0.06
    -0.06
    .Im
    -0.06
     prejudice
    -0.06
    ��
    -0.06
     pal
    -0.06
    POSITIVE LOGITS
    Regards
    0.07
    ,None
    0.06
    lj
    0.06
    emat
    0.06
     sect
    0.06
    aqu
    0.06
    Mirror
    0.06
     Milwaukee
    0.06
    Moment
    0.06
    mongoose
    0.06
    Act Density 0.001%

    No Known Activations