INDEX
    Explanations

    instances of the substring "wh," indicating a focus on words or phrases that begin with "wh."

    New Auto-Interp
    Negative Logits
    ously
    -0.18
    abel
    -0.16
     èIJ
    -0.15
    itura
    -0.15
     Hra
    -0.15
    esti
    -0.15
    hes
    -0.15
     Bull
    -0.15
    IVEN
    -0.14
     lor
    -0.14
    POSITIVE LOGITS
     wh
    0.27
    -wh
    0.21
    ining
    0.20
    izz
    0.19
    ipl
    0.18
    iners
    0.18
    .wh
    0.18
    ack
    0.18
    eras
    0.18
    oso
    0.17
    Act Density 0.011%

    No Known Activations