INDEX
    Explanations

    numerical values and their adjacent characters or contexts

    New Auto-Interp
    Negative Logits
     Adv
    -0.16
    Adv
    -0.15
    jes
    -0.15
    orth
    -0.15
    /animations
    -0.15
    trim
    -0.14
    .ibm
    -0.14
    ts
    -0.14
    tfoot
    -0.14
     Doch
    -0.14
    POSITIVE LOGITS
     gaz
    0.19
    zar
    0.17
    วà¸Ķ
    0.15
     Revolutionary
    0.15
    leta
    0.14
     Hammer
    0.14
     PR
    0.14
     Pill
    0.14
     cl
    0.14
    imary
    0.14
    Act Density 0.007%

    No Known Activations