INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Curso
    -0.07
     Yun
    -0.07
     persever
    -0.07
    need
    -0.07
    IME
    -0.07
     Need
    -0.06
     where
    -0.06
     numerous
    -0.06
    mamış
    -0.06
    endent
    -0.06
    POSITIVE LOGITS
     actually
    0.20
    actually
    0.12
    Actually
    0.11
     Actually
    0.11
     actual
    0.08
    (actual
    0.08
     Actual
    0.07
    otto
    0.07
    actual
    0.07
    0.07
    Act Density 0.017%

    No Known Activations