INDEX
    Explanations

    nouns and related forms that pertain to classification and categorization

    New Auto-Interp
    Negative Logits
     può
    -0.13
     served
    -0.13
     any
    -0.13
     Wid
    -0.13
     lä
    -0.12
    ä¸ĭæĿ¥
    -0.12
    ä¸Ģä¸ĭ
    -0.12
     loro
    -0.12
    auer
    -0.12
     free
    -0.12
    POSITIVE LOGITS
     y
    0.27
    para
    0.21
     para
    0.20
    )y
    0.20
    .Sin
    0.18
    ,y
    0.17
    Para
    0.16
     nhá»Ŀ
    0.16
     tras
    0.16
    ,
    0.16
    Act Density 0.083%

    No Known Activations