INDEX
    Explanations

    assertions about the existence or presence of something

    New Auto-Interp
    Negative Logits
    incy
    -0.16
    stav
    -0.15
    lover
    -0.15
    quip
    -0.15
    apos
    -0.15
    uib
    -0.14
    adelphia
    -0.14
    ä¹Łæľī
    -0.14
    ä¾ĭ
    -0.14
    ddit
    -0.13
    POSITIVE LOGITS
     nothing
    0.21
     NOTHING
    0.18
    .neo
    0.17
     Nothing
    0.17
     nowhere
    0.17
     nobody
    0.16
     never
    0.16
    egin
    0.15
     no
    0.15
    nothing
    0.15
    Act Density 0.085%

    No Known Activations