INDEX
    Explanations

    comparative phrases emphasizing the degree of similarity or difference

    New Auto-Interp
    Negative Logits
    æĹ¢
    -0.17
    inals
    -0.15
    astle
    -0.14
    dup
    -0.14
    onder
    -0.14
    istrovstvÃŃ
    -0.14
    isse
    -0.14
    etik
    -0.14
    Feels
    -0.14
    gew
    -0.14
    POSITIVE LOGITS
     anything
    0.29
    anything
    0.25
     Anything
    0.21
    Anything
    0.20
     ÏĢαÏģά
    0.16
     versus
    0.16
     anywhere
    0.16
     vice
    0.15
    ori
    0.14
    chine
    0.14
    Act Density 0.050%

    No Known Activations