INDEX
    Explanations

    elements pertaining to boundaries and separation

    New Auto-Interp
    Negative Logits
     Extreme
    -0.15
    acket
    -0.15
    xaf
    -0.15
    akedown
    -0.15
    ÃŃc
    -0.15
    loat
    -0.14
    ondo
    -0.14
     Sap
    -0.14
    ITES
    -0.14
    strom
    -0.14
    POSITIVE LOGITS
    aldi
    0.17
    oux
    0.17
    ardy
    0.17
    ardi
    0.16
    -wall
    0.16
    asant
    0.15
    .spin
    0.15
    ivant
    0.15
     walls
    0.14
     barrier
    0.14
    Act Density 0.195%

    No Known Activations