INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Hicks
    -0.85
     Hatt
    -0.78
     HEAD
    -0.76
     hous
    -0.75
    ipp
    -0.75
     Hipp
    -0.71
     Heads
    -0.70
     Hick
    -0.68
     fingert
    -0.68
     beck
    -0.68
    POSITIVE LOGITS
    v
    1.41
    V
    1.33
    va
    1.20
    vez
    1.17
    vir
    1.15
    vi
    1.13
    Vs
    1.12
    vu
    1.12
    VI
    1.11
    ov
    1.11
    Act Density 0.116%

    No Known Activations