INDEX
    Explanations

    expressions of humor and personal enjoyment

    New Auto-Interp
    Negative Logits
    IENT
    -0.15
    odyn
    -0.14
     Chronicle
    -0.14
    üre
    -0.14
     Kron
    -0.14
    aison
    -0.14
    iez
    -0.14
    alley
    -0.14
    /close
    -0.14
    erif
    -0.14
    POSITIVE LOGITS
    oce
    0.16
    .hw
    0.15
    Ø·ÙĨ
    0.14
    ↵↵
    0.14
    outu
    0.14
    orph
    0.14
    FREE
    0.14
    ız
    0.13
     Guth
    0.13
    /rfc
    0.13
    Act Density 0.169%

    No Known Activations