INDEX
    Explanations

    statements relating to learning, knowledge, feelings, and attitudes towards various subjects

    expressions of learning, emotions, and desires

    New Auto-Interp
    Negative Logits
    su
    -0.72
    press
    -0.69
    eli
    -0.68
    Attempt
    -0.67
    stru
    -0.66
    weed
    -0.66
    cele
    -0.65
    cephal
    -0.64
    oug
    -0.64
    intend
    -0.63
    POSITIVE LOGITS
     theirs
    0.83
     them
    0.83
     something
    0.82
     THEM
    0.78
     hers
    0.77
     nothing
    0.74
    ļé
    0.74
     lots
    0.73
     everything
    0.73
     plenty
    0.71
    Act Density 0.372%

    No Known Activations