INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    urchases
    -0.07
    节点
    -0.07
    agonal
    -0.07
     Vert
    -0.07
     Ventura
    -0.06
     monet
    -0.06
     allot
    -0.06
    quat
    -0.06
    ony
    -0.06
    edges
    -0.06
    POSITIVE LOGITS
     praise
    0.21
     praising
    0.16
     praises
    0.16
     praised
    0.16
     ghi
    0.07
     criticized
    0.07
     blame
    0.07
    raise
    0.07
     Rise
    0.06
     Strauss
    0.06
    Act Density 0.002%

    No Known Activations