INDEX
    Explanations

    words expressing comparisons or contrasts

    New Auto-Interp
    Negative Logits
    ông
    -0.15
    inaire
    -0.14
    Ñĩе
    -0.14
    upal
    -0.13
    ICLE
    -0.13
    entiful
    -0.13
    еÑĤÑĥ
    -0.13
     Assoc
    -0.13
     Hayward
    -0.13
    caffold
    -0.13
    POSITIVE LOGITS
    áºŃp
    0.16
    uten
    0.16
    esz
    0.16
     terminal
    0.15
    BUM
    0.15
    chk
    0.15
    循
    0.14
     dop
    0.14
    ender
    0.14
    iman
    0.14
    Act Density 0.001%

    No Known Activations