INDEX
    Explanations

    concepts related to physical barriers or boundaries

    New Auto-Interp
    Negative Logits
    /design
    -0.19
    丸
    -0.16
    åłĤ
    -0.15
    ırak
    -0.15
    apur
    -0.14
    strate
    -0.14
    wers
    -0.14
    serter
    -0.14
    oding
    -0.14
    nelle
    -0.14
    POSITIVE LOGITS
    edReader
    0.22
    -breaking
    0.21
    /window
    0.19
    less
    0.19
    fold
    0.19
    ways
    0.18
    edImage
    0.18
    breaking
    0.17
    maid
    0.17
    eds
    0.17
    Act Density 0.068%

    No Known Activations