INDEX
    Explanations

    phrases indicating uncertainty or questioning existence and value

    New Auto-Interp
    Negative Logits
    benh
    -0.17
    /*č↵
    -0.16
    ÅĦst
    -0.15
    igo
    -0.15
    _ZERO
    -0.14
    idle
    -0.14
    ãĥªãĥ¼ãĤº
    -0.14
    elerik
    -0.14
    alia
    -0.14
    Idle
    -0.14
    POSITIVE LOGITS
     anymore
    1.00
     nữa
    0.56
     artık
    0.42
     longer
    0.38
     lagi
    0.37
    ãģªãģıãģª
    0.35
     دÛĮگر
    0.33
     no
    0.32
    åĨį
    0.31
     again
    0.30
    Act Density 0.381%

    No Known Activations