INDEX
    Explanations

    references to cats and related content in the text

    New Auto-Interp
    Negative Logits
    quito
    -0.17
     pup
    -0.17
    Mp
    -0.15
    wap
    -0.15
     puppy
    -0.14
    MP
    -0.14
    anske
    -0.14
    μÏĢ
    -0.14
     Pig
    -0.14
     rot
    -0.14
    POSITIVE LOGITS
     cat
    0.34
    -cat
    0.34
     cats
    0.33
     Cat
    0.32
     Cats
    0.30
    /cat
    0.27
    Cat
    0.27
    (cat
    0.27
    çĮ«
    0.26
     kittens
    0.26
    Act Density 0.034%

    No Known Activations