INDEX
    Explanations

    words related to physical actions or impacts

    New Auto-Interp
    Negative Logits
    iser
    -0.08
    istic
    -0.08
    hin
    -0.08
    strict
    -0.08
    .au
    -0.08
    istics
    -0.07
    readcr
    -0.07
    estruct
    -0.07
    hang
    -0.07
    dest
    -0.07
    POSITIVE LOGITS
    ively
    0.08
    ingly
    0.07
    aller
    0.07
    nowled
    0.07
    et
    0.07
    ur
    0.07
    al
    0.06
    nowledge
    0.06
    able
    0.06
    ¯ÃĤ
    0.06
    Act Density 0.012%

    No Known Activations