INDEX
    Explanations

    phrases indicating comparisons or contrasts

    New Auto-Interp
    Negative Logits
    idth
    -0.15
    imal
    -0.15
    ital
    -0.14
    _ATTRIBUTES
    -0.14
    apse
    -0.14
    eric
    -0.14
    utin
    -0.13
    ivar
    -0.13
    ubah
    -0.13
    unicode
    -0.13
    POSITIVE LOGITS
    aeda
    0.17
    etto
    0.17
    otts
    0.16
    eker
    0.16
    unto
    0.15
     Sac
    0.15
    ekten
    0.15
     sac
    0.15
     THR
    0.14
    unken
    0.14
    Act Density 0.010%

    No Known Activations