INDEX
    Explanations

    references to cats and their behavior

    New Auto-Interp
    Negative Logits
    quito
    -0.19
     horses
    -0.16
     Pig
    -0.16
     horse
    -0.16
    PG
    -0.15
    MP
    -0.15
    ala
    -0.14
     pigs
    -0.14
    úb
    -0.14
    allo
    -0.14
    POSITIVE LOGITS
     cat
    0.32
     Cat
    0.31
    -cat
    0.30
     cats
    0.30
     Cats
    0.27
    çĮ«
    0.27
    Cat
    0.26
    (cat
    0.25
    /cat
    0.25
    .cat
    0.24
    Act Density 0.039%

    No Known Activations