INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     mer
    -0.07
     poo
    -0.07
    _dll
    -0.07
    -0.07
     inference
    -0.07
    _ll
    -0.07
    string
    -0.07
    <l
    -0.06
    ע
    -0.06
     men
    -0.06
    POSITIVE LOGITS
     category
    0.10
     categories
    0.09
     Fach
    0.08
     Çağ
    0.07
    atown
    0.07
     دسته
    0.07
    ayi
    0.07
    -category
    0.07
    	category
    0.07
     Category
    0.07
    Act Density 0.058%

    No Known Activations