INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    return
    -1.84
     return
    -1.70
     then
    -1.64
     even
    -1.52
     or
    -1.41
     returning
    -1.35
     so
    -1.33
     Even
    -1.24
     but
    -1.21
     returned
    -1.21
    POSITIVE LOGITS
     CONSTANT
    1.13
    whenever
    1.12
    quando
    1.10
     magnific
    1.07
    justed
    1.07
     strikingly
    1.06
    1.06
    1.04
     RESPONSE
    1.02
    gefügt
    1.02
    Act Density 0.013%

    No Known Activations