INDEX
    Explanations

    specific nouns, especially those related to objects or entities

    New Auto-Interp
    Negative Logits
    itel
    -0.17
    ovel
    -0.16
    213
    -0.16
    152
    -0.16
    lian
    -0.15
     Ãĸl
    -0.15
     Canter
    -0.14
    rer
    -0.14
     propTypes
    -0.14
    ÄĻki
    -0.14
    POSITIVE LOGITS
    -none
    0.16
    /rfc
    0.15
    اسÙĩ
    0.14
    İZ
    0.14
    noop
    0.14
    iben
    0.14
    knife
    0.14
    eway
    0.14
    екÑĤи
    0.14
     falling
    0.14
    Act Density 0.023%

    No Known Activations