INDEX
    Explanations

    negations and contrasts in context

    New Auto-Interp
    Negative Logits
     Lyon
    -0.15
    plit
    -0.15
    odos
    -0.14
     Narrow
    -0.14
    oney
    -0.14
    dos
    -0.14
    quette
    -0.14
    iyah
    -0.14
    dit
    -0.14
     ç¤
    -0.14
    POSITIVE LOGITS
    vero
    0.18
    sett
    0.17
    arend
    0.15
    æľĹ
    0.15
    ECTOR
    0.14
    олÑİ
    0.14
    IVERS
    0.14
    ÙĦÙĬÙģ
    0.14
    æķ
    0.14
    rve
    0.13
    Act Density 0.048%

    No Known Activations