INDEX
    Explanations

    references to duality or comparisons between pairs

    New Auto-Interp
    Negative Logits
    czy
    -0.16
    ient
    -0.15
    aja
    -0.15
    xit
    -0.15
    gne
    -0.14
    gro
    -0.14
    imate
    -0.13
    few
    -0.13
    lah
    -0.13
    ow
    -0.13
    POSITIVE LOGITS
    ymm
    0.14
    ERIC
    0.14
    é̏
    0.14
     полÑı
    0.14
     mình
    0.14
     Bott
    0.13
    conti
    0.13
    訳
    0.13
     controversial
    0.13
    plet
    0.13
    Act Density 0.028%

    No Known Activations