INDEX
    Explanations

    themes related to contrasting experiences of joy and despair

    New Auto-Interp
    Negative Logits
    illon
    -0.15
    aight
    -0.15
    480
    -0.14
    arton
    -0.14
    ieval
    -0.14
    .$.
    -0.14
    itur
    -0.14
    yles
    -0.14
    alon
    -0.14
    mates
    -0.14
    POSITIVE LOGITS
    èά
    0.23
    ous
    0.21
    like
    0.18
    /ext
    0.18
    -like
    0.17
    uous
    0.16
    /exp
    0.15
    ously
    0.15
     proportions
    0.15
    istic
    0.15
    Act Density 0.272%

    No Known Activations