INDEX
    Explanations

    structural features and spatial descriptions in sentences

    New Auto-Interp
    Negative Logits
     Ups
    -0.15
    adge
    -0.14
    hazi
    -0.14
     Gül
    -0.14
    .tiles
    -0.14
    ke
    -0.14
     world
    -0.13
    olas
    -0.13
    opy
    -0.13
     deals
    -0.13
    POSITIVE LOGITS
    LOY
    0.15
    emailer
    0.15
    icker
    0.15
    ickers
    0.14
    ettel
    0.14
    lement
    0.14
    uest
    0.14
     Schro
    0.14
     phÃŃa
    0.14
    ADOW
    0.14
    Act Density 0.210%

    No Known Activations