INDEX
    Explanations

    phrases indicating a comparison or contrast

    New Auto-Interp
    Negative Logits
     even
    -0.18
    even
    -0.16
     EVEN
    -0.16
    uchen
    -0.16
     actually
    -0.14
    oui
    -0.14
    IDI
    -0.14
     Lair
    -0.14
    teri
    -0.14
    chwitz
    -0.13
    POSITIVE LOGITS
    929
    0.18
    ÏĦί
    0.16
    umas
    0.15
    eln
    0.15
     Versions
    0.15
     until
    0.15
    ãģªãĤī
    0.15
    BorderStyle
    0.15
    лаж
    0.14
     Güven
    0.14
    Act Density 0.027%

    No Known Activations