INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    è¿ĩçļĦ
    -0.30
    åĨįæĿ¥
    -0.29
    åĮħ容
    -0.28
    èµ
    -0.28
    éĽı
    -0.27
    =edge
    -0.25
    ç±»åŀĭçļĦ
    -0.25
     Kov
    -0.25
    ä¸ĢåĪĨéĴŁ
    -0.25
    sätze
    -0.25
    POSITIVE LOGITS
     bind
    0.26
    ç»ıåķĨ
    0.26
    decl
    0.25
    antro
    0.25
    åIJĮåŁİ
    0.25
     tug
    0.25
    inema
    0.24
    rng
    0.24
    odi
    0.24
     sir
    0.23
    Act Density 0.007%

    No Known Activations