INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    520
    -0.07
    stial
    -0.07
     santa
    -0.07
     Santo
    -0.07
    nl
    -0.07
     свя
    -0.07
     Phys
    -0.06
    άνα
    -0.06
    518
    -0.06
    427
    -0.06
    POSITIVE LOGITS
     edge
    0.12
     Edge
    0.12
    Edge
    0.10
    EDGE
    0.10
     EDGE
    0.09
    edge
    0.09
     edges
    0.09
    .edge
    0.08
     wedge
    0.08
    -edge
    0.08
    Act Density 0.011%

    No Known Activations