INDEX
    Explanations

    emotional and nostalgic language related to personal memories and experiences

    New Auto-Interp
    Negative Logits
     merit
    -0.73
     attest
    -0.67
     preference
    -0.67
    opath
    -0.66
     complement
    -0.63
     fide
    -0.63
     scapego
    -0.63
     depletion
    -0.63
     replacement
    -0.63
     privilege
    -0.62
    POSITIVE LOGITS
    BUT
    1.11
    yet
    1.06
    until
    1.05
    why
    1.05
    oops
    1.04
    unless
    0.97
    but
    0.96
    except
    0.96
    they
    0.94
    that
    0.93
    Act Density 0.042%

    No Known Activations