INDEX
    Explanations

    phrases indicating a sense of immediacy or current events

    New Auto-Interp
    Negative Logits
    unate
    -0.18
    oro
    -0.15
     OTHERWISE
    -0.15
     otherwise
    -0.15
    nze
    -0.14
    егоÑĢ
    -0.14
    ilon
    -0.14
    oras
    -0.14
    ãģĵãĤĵãģ«ãģ¡ãģ¯
    -0.13
    ants
    -0.13
    POSITIVE LOGITS
    withstanding
    0.23
    adays
    0.23
     же
    0.18
    itz
    0.18
    here
    0.17
    fter
    0.17
    ä¹İ
    0.15
    HERE
    0.15
     UIP
    0.14
    aken
    0.14
    Act Density 0.027%

    No Known Activations