INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ationToken
    -0.07
     girlfriends
    -0.06
    -0.06
     realms
    -0.06
    GMEM
    -0.06
    Пос
    -0.06
    و
    -0.06
    زيد
    -0.06
    -0.06
    ここ
    -0.06
    POSITIVE LOGITS
     sometime
    0.08
     Confirm
    0.07
     Alt
    0.07
     sighting
    0.07
     confirm
    0.06
     deliver
    0.06
     Gamb
    0.06
     latent
    0.06
     vice
    0.06
     ayuda
    0.06
    Act Density 0.024%

    No Known Activations