INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     gren
    -0.07
    .POS
    -0.06
     Moore
    -0.06
     potion
    -0.06
     devast
    -0.06
     можна
    -0.06
    indent
    -0.06
    anning
    -0.06
    房间
    -0.06
     discovering
    -0.06
    POSITIVE LOGITS
     carry
    0.11
     carries
    0.11
     carrying
    0.09
     carried
    0.09
     Carry
    0.07
    Vel
    0.07
    0.07
    отов
    0.07
    raya
    0.07
    فر
    0.06
    Act Density 0.009%

    No Known Activations