INDEX
    Explanations

    references to specific spans or structures in textual data

    New Auto-Interp
    Negative Logits
    yas
    -0.17
    -ÑĤо
    -0.16
    ystone
    -0.16
    Spatial
    -0.16
    ermann
    -0.16
    sk
    -0.15
     Spatial
    -0.15
    annon
    -0.15
    ted
    -0.15
    erman
    -0.15
    POSITIVE LOGITS
    ned
    0.31
    ning
    0.29
    nable
    0.27
    iards
    0.27
    iard
    0.26
    nung
    0.20
     Span
    0.19
     span
    0.18
    oud
    0.17
    berger
    0.17
    Act Density 0.012%

    No Known Activations