INDEX
    Explanations

    instances of the word "h" and variations with different activation values

    New Auto-Interp
    Negative Logits
     Eternity
    -0.71
     mosqu
    -0.65
     ducks
    -0.64
     Bots
    -0.64
     conclud
    -0.64
     destro
    -0.61
    eering
    -0.61
     wedge
    -0.60
     heartbeat
    -0.60
     Clicker
    -0.60
    POSITIVE LOGITS
    ulhu
    1.24
    orses
    1.10
    arma
    1.05
    atever
    1.03
    agen
    1.03
    arel
    1.03
    ilar
    1.02
    ollow
    0.97
    airy
    0.95
    onest
    0.94
    Act Density 0.011%

    No Known Activations