INDEX
    Explanations

    contrastive conjunctions and qualifiers that indicate complexity or exception in arguments

    New Auto-Interp
    Negative Logits
    boro
    -0.18
    bole
    -0.18
    unt
    -0.18
    gary
    -0.15
    bsp
    -0.15
    ushing
    -0.15
     unt
    -0.14
    hq
    -0.14
    im
    -0.13
     Vaugh
    -0.13
    POSITIVE LOGITS
    ĥn
    0.16
    oen
    0.15
     Denn
    0.15
    iffin
    0.14
    ifo
    0.14
    315
    0.14
    yleft
    0.14
    sWith
    0.14
    볬
    0.13
    imals
    0.13
    Act Density 0.262%

    No Known Activations