INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Luke
    -0.08
    alic
    -0.07
    alc
    -0.07
    ак
    -0.07
    ate
    -0.07
     Lucas
    -0.07
    Luke
    -0.07
     Tok
    -0.07
    Nullable
    -0.07
    _top
    -0.07
    POSITIVE LOGITS
     every
    0.21
     Every
    0.19
    every
    0.17
    Every
    0.16
     EVERY
    0.15
    VERY
    0.11
    .every
    0.11
    _every
    0.11
     EVER
    0.09
    very
    0.09
    Act Density 0.047%

    No Known Activations