INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bức
    -0.07
     občan
    -0.06
     jim
    -0.06
    FFFFFFFF
    -0.06
     favour
    -0.06
     сосуд
    -0.06
    -operation
    -0.06
     gösteren
    -0.06
     several
    -0.06
     c
    -0.06
    POSITIVE LOGITS
     intro
    0.10
     Intro
    0.09
    intros
    0.08
     intros
    0.08
    .shortcuts
    0.08
    ."""
    0.07
    Intro
    0.07
    0.07
     idle
    0.07
    intro
    0.07
    Act Density 0.002%

    No Known Activations