INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ольш
    -0.07
     sym
    -0.07
     Colony
    -0.07
    .utilities
    -0.06
     Royal
    -0.06
     distances
    -0.06
     swarm
    -0.06
    ux
    -0.06
     luxe
    -0.06
    stitution
    -0.06
    POSITIVE LOGITS
     before
    0.27
     Before
    0.20
    Before
    0.20
    before
    0.20
     BEFORE
    0.18
    _before
    0.13
    (before
    0.13
    	before
    0.13
    -before
    0.12
    _BEFORE
    0.10
    Act Density 0.048%

    No Known Activations