INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ك
    0.66
    ियों
    0.61
    માં
    0.58
     on
    0.57
     الجمهور
    0.56
    IF
    0.56
    İM
    0.55
    к
    0.55
    ों
    0.55
    いきます
    0.55
    POSITIVE LOGITS
    cat
    0.84
    cats
    0.80
    categories
    0.77
    category
    0.74
    agory
    0.74
    d
    0.74
    acat
    0.73
    cate
    0.71
    subcategory
    0.70
    Cats
    0.69
    Act Density 0.115%

    No Known Activations