INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
     consc
    -0.07
     transl
    -0.07
     Coil
    -0.06
    oin
    -0.06
     repression
    -0.06
    Smooth
    -0.06
     sof
    -0.06
    .CONT
    -0.06
     Corm
    -0.06
    POSITIVE LOGITS
     dedicated
    0.09
     Dedicated
    0.08
    bilt
    0.08
    ricular
    0.08
    SEC
    0.07
    assigned
    0.07
    ataka
    0.07
    -native
    0.07
     podařilo
    0.07
     무슨
    0.07
    Act Density 0.008%

    No Known Activations