INDEX
    Explanations

    references to "mind" or related concepts

    New Auto-Interp
    Negative Logits
    EZ
    -0.16
    ritz
    -0.15
    bett
    -0.15
    lication
    -0.15
    izzly
    -0.14
    akedown
    -0.14
    ost
    -0.14
    antino
    -0.14
    aylor
    -0.14
    eyer
    -0.14
    POSITIVE LOGITS
    fulness
    0.29
    lessly
    0.26
    ustry
    0.26
    fully
    0.26
    sets
    0.25
    /body
    0.23
    FUL
    0.23
    fuck
    0.23
    -num
    0.23
    meld
    0.21
    Act Density 0.009%

    No Known Activations