INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     planes
    -0.28
    ä¹ĭéģĵ
    -0.28
    ppe
    -0.27
    gı
    -0.27
    è¿IJæ°Ķ
    -0.27
    icap
    -0.27
    iou
    -0.26
    ios
    -0.26
    rollers
    -0.25
    kits
    -0.25
    POSITIVE LOGITS
    Domin
    0.27
    å¼·ãģı
    0.26
    ä¸ĭåij¨
    0.25
    ocket
    0.24
    æĪij们认为
    0.24
    ä½łæĺ¯
    0.24
    å®ĥæĺ¯
    0.23
    åij¨å²ģ
    0.23
    åħ¨éĿ¢èIJ½å®ŀ
    0.23
    为主ä½ĵ
    0.23
    Act Density 0.022%

    No Known Activations