INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Taylor
    -0.06
    ада
    -0.06
     pBuffer
    -0.06
     مشکلات
    -0.06
    чает
    -0.06
     dream
    -0.06
     triumph
    -0.06
     walnut
    -0.06
    -0.06
    erta
    -0.06
    POSITIVE LOGITS
     cat
    0.18
     cats
    0.16
     Cats
    0.12
     kittens
    0.11
     kitt
    0.10
     kitten
    0.10
     kitty
    0.10
    0.08
     кош
    0.07
    -cat
    0.07
    Act Density 0.019%

    No Known Activations