INDEX
    Explanations

    verbs and phrases related to measurement and evaluation

    New Auto-Interp
    Negative Logits
    /her
    -0.16
    umer
    -0.16
    NESS
    -0.15
    arde
    -0.15
    ness
    -0.15
    åĹ
    -0.15
    ewan
    -0.14
    Ń
    -0.14
     Fade
    -0.13
     yaptıģı
    -0.13
    POSITIVE LOGITS
     Ñģобой
    0.25
     themselves
    0.23
    ä¸įäºĨ
    0.21
    (ed
    0.19
     differently
    0.17
    ingly
    0.17
    rowse
    0.17
    iani
    0.16
    ÑģÑıÑĤ
    0.16
    ¶Į
    0.16
    Act Density 0.211%

    No Known Activations