INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pop
    -0.07
    Apart
    -0.07
     pok
    -0.07
     tapped
    -0.06
     topped
    -0.06
     vengeance
    -0.06
     utmost
    -0.06
     rats
    -0.06
     backed
    -0.06
     plenty
    -0.06
    POSITIVE LOGITS
     correct
    0.08
     Buna
    0.08
     incorrect
    0.07
     correctamente
    0.07
    」と
    0.07
    Incorrect
    0.07
    .can
    0.07
    uctose
    0.06
    (correct
    0.06
    ObjectContext
    0.06
    Act Density 0.027%

    No Known Activations