INDEX
    Explanations

    Math multiple choice answers

    New Auto-Interp
    Negative Logits
     Buena
    -0.11
     WIN
    -0.08
    arrant
    -0.08
    alada
    -0.08
     castles
    -0.08
     Раз
    -0.07
    winds
    -0.07
     weißen
    -0.07
    .story
    -0.07
    .wp
    -0.07
    POSITIVE LOGITS
     parmi
    0.08
    시오
    0.08
    uple
    0.07
    0.07
     اص
    0.07
     hemorr
    0.07
     procl
    0.07
     ponu
    0.07
    0.07
    0.07
    Act Density 0.250%

    No Known Activations