INDEX
    Explanations

    names containing the sequence "ih" with varying activations

    instances of the substring "ih" in various contexts

    New Auto-Interp
    Negative Logits
     pop
    -0.71
     pie
    -0.69
     aster
    -0.67
     fixture
    -0.64
     snap
    -0.64
     Pop
    -0.64
     popup
    -0.62
     Pie
    -0.61
     dotted
    -0.59
     plaster
    -0.59
    POSITIVE LOGITS
    ih
    4.16
    iy
    1.52
    ij
    1.40
    uh
    1.27
    iw
    1.25
    oh
    1.21
    iq
    1.20
    ika
    1.19
    ich
    1.14
    ihu
    1.13
    Act Density 0.013%

    No Known Activations