INDEX
    Explanations

    references to comparisons or contrasts in a variety of contexts

    New Auto-Interp
    Negative Logits
    ModelProperty
    -0.15
    andler
    -0.15
    869
    -0.15
     Kum
    -0.14
    rese
    -0.14
    eq
    -0.14
     Stephan
    -0.14
     Maul
    -0.13
    mur
    -0.13
    emann
    -0.13
    POSITIVE LOGITS
    pes
    0.16
     prim
    0.16
    rch
    0.15
     Prim
    0.15
    uds
    0.14
    zbek
    0.14
    mlink
    0.13
    he
    0.13
    .Permission
    0.13
     ÙģÙĤ
    0.13
    Act Density 0.193%

    No Known Activations