INDEX
    Explanations

    instances of the word "now"

    New Auto-Interp
    Negative Logits
    etrofit
    -0.16
    ANDOM
    -0.16
    kowski
    -0.15
    andom
    -0.15
    abbo
    -0.14
    zhou
    -0.14
    unate
    -0.14
    adele
    -0.14
    rape
    -0.14
    pes
    -0.14
    POSITIVE LOGITS
    adays
    0.22
    ise
    0.18
    withstanding
    0.18
    aday
    0.17
    indow
    0.17
    ãĥĩãĤ£ãĤ¢
    0.16
    βε
    0.14
    here
    0.14
    PFN
    0.14
    ark
    0.14
    Act Density 0.041%

    No Known Activations