INDEX
    Explanations

    references to movies, particularly those with comedic or fantastical elements

    New Auto-Interp
    Negative Logits
    olit
    -0.18
    _simps
    -0.15
    STAT
    -0.14
    enson
    -0.14
    erre
    -0.14
    è¾ħ
    -0.14
    cela
    -0.14
    ież
    -0.13
    êµ´
    -0.13
    unday
    -0.13
    POSITIVE LOGITS
     nak
    0.14
     Feature
    0.14
     Priv
    0.14
     Guerr
    0.14
       
    0.14
     lex
    0.13
    ocities
    0.13
     feature
    0.13
     Amazon
    0.13
     Rank
    0.13
    Act Density 0.028%

    No Known Activations