INDEX
    Explanations

    instances of contrastive conjunctions or phrases indicating exceptions

    New Auto-Interp
    Negative Logits
    ksam
    -0.16
     multif
    -0.15
    TEL
    -0.15
    اراÙĨ
    -0.14
    gregar
    -0.13
    å¼¥
    -0.13
    thrown
    -0.13
     Fest
    -0.13
    273
    -0.13
     Explorer
    -0.13
    POSITIVE LOGITS
    .Invariant
    0.15
     Nass
    0.14
    landa
    0.14
    ναν
    0.14
    nap
    0.14
     hope
    0.14
    branch
    0.14
    su
    0.14
    _HINT
    0.14
    hope
    0.14
    Act Density 0.235%

    No Known Activations