INDEX
    Explanations

    instances of the word "on" and its variations

    New Auto-Interp
    Negative Logits
    gether
    -0.19
    lessly
    -0.19
    ún
    -0.19
    wicklung
    -0.17
    ories
    -0.16
    nze
    -0.15
    wick
    -0.15
    jack
    -0.15
    fighter
    -0.14
    ophil
    -0.14
    POSITIVE LOGITS
    us
    0.21
    coming
    0.21
    look
    0.19
    inous
    0.19
    emin
    0.18
    yx
    0.18
    again
    0.17
    lsa
    0.17
    rush
    0.17
    ep
    0.16
    Act Density 0.042%

    No Known Activations