INDEX
    Explanations

    structured lists and numbered items in the text

    New Auto-Interp
    Negative Logits
    sin
    -0.17
    .cfg
    -0.14
     co
    -0.14
     Ton
    -0.14
     hal
    -0.13
    umin
    -0.13
    jo
    -0.13
     pseud
    -0.13
     arc
    -0.13
    uct
    -0.13
    POSITIVE LOGITS
    indow
    0.18
    edik
    0.15
    dük
    0.15
    \a
    0.15
    γÏīγ
    0.15
    ownik
    0.15
    edback
    0.14
    UDA
    0.14
     Bbw
    0.14
    amak
    0.14
    Act Density 0.100%

    No Known Activations