INDEX
    Explanations

    references to time, particularly the word "today" and its variations

    New Auto-Interp
    Negative Logits
     back
    -0.15
    ceed
    -0.14
    amber
    -0.14
    odied
    -0.14
     cough
    -0.14
     Pierce
    -0.14
    æī
    -0.14
    WN
    -0.13
     Fried
    -0.13
    hub
    -0.13
    POSITIVE LOGITS
    GenerationStrategy
    0.15
    ittal
    0.14
    ÑĮогоднÑĸ
    0.14
    eza
    0.14
     ä¸ĸ
    0.14
     æ¹
    0.14
    ç̬
    0.14
    TEGER
    0.14
    rov
    0.14
    itzer
    0.13
    Act Density 0.067%

    No Known Activations