INDEX
    Explanations

    terms related to focus or attention

    New Auto-Interp
    Negative Logits
     hans
    -0.19
     Hans
    -0.17
     Malone
    -0.16
    tl
    -0.15
    iams
    -0.15
    erce
    -0.15
     overs
    -0.15
    904
    -0.14
    IMS
    -0.14
    ijd
    -0.14
    POSITIVE LOGITS
    ussed
    0.23
    cus
    0.18
    uss
    0.16
    als
    0.16
    USED
    0.16
    λια
    0.16
    selling
    0.15
    cing
    0.15
    usses
    0.15
    imbus
    0.15
    Act Density 0.007%

    No Known Activations