INDEX
    Explanations

    words related to intentions or meanings

    the word "mean" and its various forms, focusing on expressions of intent and significance

    New Auto-Interp
    Negative Logits
    aqu
    -0.72
    Newsletter
    -0.69
    icht
    -0.65
    @#&
    -0.65
    dfx
    -0.63
    ttes
    -0.62
    Sham
    -0.61
     Frazier
    -0.60
    ngth
    -0.59
    anon
    -0.58
    POSITIVE LOGITS
     spirited
    0.86
     goodbye
    0.84
    lessness
    0.75
     nothing
    0.73
     something
    0.73
    INESS
    0.72
    ãĥĥãĤ¯
    0.72
     exactly
    0.71
    erella
    0.71
     bye
    0.70
    Act Density 0.050%

    No Known Activations