INDEX
    Explanations

    repeated instances of the word "this" and phrases indicating celebrations or events

    New Auto-Interp
    Negative Logits
    :,
    -0.14
    antic
    -0.14
    gone
    -0.14
    ipar
    -0.14
    #echo
    -0.13
    /to
    -0.13
    omes
    -0.13
    Ùĩ
    -0.13
    ma
    -0.13
    eniable
    -0.13
    POSITIVE LOGITS
     is
    0.31
     was
    0.24
     marks
    0.21
     isn
    0.19
     adalah
    0.19
     ÑıвлÑıеÑĤÑģÑı
    0.19
    ãģ¯
    0.19
    æĺ¯ä¸Ģ
    0.18
    ä¹Łæĺ¯
    0.18
    æĺ¯æĪij
    0.18
    Act Density 0.111%

    No Known Activations