INDEX
    Explanations

    the presence of specific numeric identifiers or codes

    New Auto-Interp
    Negative Logits
    hausen
    -0.21
    hart
    -0.19
    hammer
    -0.19
    EXPR
    -0.19
    horse
    -0.18
    holm
    -0.17
    hop
    -0.17
    hin
    -0.17
    handling
    -0.16
    hb
    -0.15
    POSITIVE LOGITS
    riangle
    0.30
    emporary
    0.30
    wo
    0.29
    emple
    0.29
    ra
    0.27
    exas
    0.27
    urn
    0.27
    emperature
    0.27
    ech
    0.26
    erm
    0.26
    Act Density 0.017%

    No Known Activations