INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Determ
    -0.08
    -0.07
    -0.07
    306
    -0.07
     Literal
    -0.07
    -0.07
     donuts
    -0.07
     literal
    -0.07
     '{"
    -0.07
    .dv
    -0.07
    POSITIVE LOGITS
     separate
    0.18
     seperate
    0.17
     dedicated
    0.17
     separado
    0.16
     separar
    0.16
     ayrı
    0.16
    Separate
    0.16
     Separate
    0.15
     separates
    0.15
     отдельно
    0.15
    Act Density 0.046%

    No Known Activations