INDEX
    Explanations

    verbs and adverbs

    New Auto-Interp
    Negative Logits
    ’na
    -0.07
    щий
    -0.07
    -0.06
    очка
    -0.06
    _UL
    -0.06
    тий
    -0.06
    Works
    -0.06
    izens
    -0.06
    Writer
    -0.06
    _elem
    -0.06
    POSITIVE LOGITS
     асп
    0.07
     nào
    0.07
     selv
    0.07
    0.07
     Lav
    0.06
     öz
    0.06
     تلویزی
    0.06
     hấp
    0.06
     lav
    0.06
     createState
    0.06
    Act Density 0.003%

    No Known Activations