INDEX
    Explanations

    phrases emphasizing existence or availability, particularly using the word "there."

    New Auto-Interp
    Negative Logits
    urette
    -0.16
    elog
    -0.14
    ä¾ĭ
    -0.14
    岸
    -0.14
    ekim
    -0.14
    oya
    -0.14
    å®Ī
    -0.14
    itarian
    -0.14
    ocz
    -0.14
    ıi
    -0.13
    POSITIVE LOGITS
     couldn
    0.22
    couldn
    0.20
     Couldn
    0.19
     simply
    0.18
     truly
    0.18
    idar
    0.16
    pect
    0.15
     lies
    0.15
    TYPO
    0.15
     honestly
    0.15
    Act Density 0.061%

    No Known Activations