INDEX
    Explanations

    negations and phrases indicating exceptions or contrasts

    New Auto-Interp
    Negative Logits
    486
    -0.18
    ounder
    -0.16
    innacle
    -0.16
     meanwhile
    -0.15
     ale
    -0.15
    опаÑģ
    -0.15
    IMA
    -0.14
    addir
    -0.14
    imar
    -0.14
    esco
    -0.14
    POSITIVE LOGITS
     necessarily
    0.17
    çĹ
    0.16
    withstanding
    0.16
    chers
    0.16
    ANJI
    0.15
    ting
    0.15
    adena
    0.14
    rons
    0.14
     дÑĢев
    0.14
    tingham
    0.14
    Act Density 0.047%

    No Known Activations