INDEX
    Explanations

    references to categorization, evaluation, or classification of concepts or individuals

    New Auto-Interp
    Negative Logits
     Wald
    -0.16
    .pivot
    -0.15
    ØŃد
    -0.15
    olg
    -0.15
    ög
    -0.14
    Ñģим
    -0.14
    plements
    -0.14
    ILD
    -0.14
     distributed
    -0.13
     dziew
    -0.13
    POSITIVE LOGITS
     pile
    0.21
     list
    0.21
     bucket
    0.20
    ategori
    0.19
     camp
    0.19
    bucket
    0.18
    pile
    0.17
    amarin
    0.17
     category
    0.17
    ibox
    0.17
    Act Density 0.139%

    No Known Activations