INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    EQ
    -0.34
    å°±è¿ij
    -0.29
    subst
    -0.26
    inel
    -0.26
     preference
    -0.26
     EQ
    -0.26
     matrix
    -0.25
    не
    -0.25
     Dix
    -0.25
    matrix
    -0.25
    POSITIVE LOGITS
     Pty
    0.29
    akan
    0.28
     thing
    0.26
    æľºèĥ½
    0.26
    åħ¶ä»ĸ人
    0.25
    åŃ
    0.25
     limbs
    0.25
    çŁĽçĽ¾
    0.24
    è§ĦåĪĻ
    0.24
    jong
    0.24
    Act Density 3.639%

    No Known Activations