INDEX
    Explanations

    words related to structures used for defense or fortification

    Category, CAT, or cat (related tokens)

    category classification

    New Auto-Interp
    Negative Logits
    ]))
    
    -0.71
    es
    -0.67
    &\
    -0.64
    vocable
    -0.63
     }))
    -0.62
    monary
    -0.62
     strconv
    -0.61
     Aires
    -0.60
    s
    -0.58
    ]));
    
    -0.57
    POSITIVE LOGITS
     cats
    1.56
     Cats
    1.53
    Cats
    1.50
     CAT
    1.40
     Cat
    1.35
    Cat
    1.31
     cat
    1.27
    cats
    1.27
     Catt
    1.21
    cat
    1.19
    Act Density 0.224%

    No Known Activations