INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     incumb
    -0.07
    .outputs
    -0.07
     schwar
    -0.06
     Mec
    -0.06
     dàng
    -0.06
     compet
    -0.06
     puts
    -0.06
     emphas
    -0.06
     starring
    -0.06
    /car
    -0.06
    POSITIVE LOGITS
     discern
    0.07
    ▍▍
    0.07
    raid
    0.07
    ZZ
    0.06
    Раз
    0.06
    0.06
    arda
    0.06
     Lab
    0.06
    haul
    0.06
    rophy
    0.06
    Act Density 0.011%

    No Known Activations