INDEX
    Explanations

    words that indicate quantity, positioning, or relationships

    New Auto-Interp
    Negative Logits
    ings
    -0.15
    _like
    -0.15
    eno
    -0.15
     premises
    -0.14
    UN
    -0.14
    zes
    -0.14
     matter
    -0.14
    ore
    -0.14
     way
    -0.13
    ometry
    -0.13
    POSITIVE LOGITS
    .nlm
    0.17
    alia
    0.16
    .mas
    0.15
    owitz
    0.14
    ابت
    0.14
    imson
    0.14
    ë¯
    0.14
    еÑĢÑĤи
    0.14
    ê°ģ
    0.14
    presso
    0.14
    Act Density 0.014%

    No Known Activations