INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    そして
    -0.08
    -0.08
     Ila
    -0.07
     Primero
    -0.07
     chois
    -0.07
     reserv
    -0.07
     Loko
    -0.07
     réserv
    -0.07
    LT
    -0.07
    -0.07
    POSITIVE LOGITS
     halves
    0.08
    Centered
    0.08
     pyst
    0.08
    .minecraft
    0.08
    -shaped
    0.08
    hedron
    0.07
     attacked
    0.07
     minecraft
    0.07
     certificate
    0.07
     symmetrical
    0.07
    Act Density 0.001%

    No Known Activations