INDEX
    Explanations

    references to cats or related cat terminology

    New Auto-Interp
    Negative Logits
    ")));
    
    -0.80
     Aguilera
    -0.70
    PYX
    -0.69
     оригіналу
    -0.68
     "'");
    -0.68
    ]))
    
    -0.68
    ")));
    -0.67
     compri
    -0.67
    )");
    
    -0.66
     alberto
    -0.66
    POSITIVE LOGITS
     cat
    3.46
     Cat
    3.35
    Cat
    3.18
    cat
    3.04
     cats
    2.94
     CAT
    2.81
     Cats
    2.71
    CAT
    2.60
    Cats
    2.60
    cats
    2.53
    Act Density 0.063%

    No Known Activations