INDEX
    Explanations

    contrastive comparative phrases highlighting differences between two subjects or situations

    New Auto-Interp
    Negative Logits
    ucken
    -0.17
    eness
    -0.16
    ynet
    -0.16
    ritten
    -0.15
    mey
    -0.15
    izmet
    -0.15
    нÑĥв
    -0.15
    è¯ij
    -0.14
    ibri
    -0.14
    ailand
    -0.14
    POSITIVE LOGITS
     Chu
    0.15
     spre
    0.15
    Steam
    0.14
    chy
    0.14
     steam
    0.14
    andbox
    0.14
     Steam
    0.14
    chter
    0.14
     ru
    0.14
    opak
    0.14
    Act Density 0.081%

    No Known Activations