INDEX
    Explanations

    conjunctions and phrases that indicate comparisons or contrasts

    New Auto-Interp
    Negative Logits
    ific
    -0.16
    jo
    -0.16
    996
    -0.15
     Relief
    -0.14
     ελλην
    -0.14
    taire
    -0.14
    ät
    -0.14
     UNUSED
    -0.14
    ewe
    -0.14
    _PATCH
    -0.13
    POSITIVE LOGITS
    ppe
    0.16
    ICES
    0.15
    lsen
    0.15
    posables
    0.15
    οÏįÏĤ
    0.15
    stell
    0.14
    spender
    0.14
     repetition
    0.14
     detr
    0.14
    ismatic
    0.14
    Act Density 0.264%

    No Known Activations