INDEX
    Explanations

    phrases and constructs indicating contrast or contradiction

    New Auto-Interp
    Negative Logits
     DIE
    -0.17
    ecies
    -0.15
    że
    -0.15
    entic
    -0.14
     Pend
    -0.14
    rouch
    -0.14
     Die
    -0.14
    ismu
    -0.14
    gings
    -0.14
    gens
    -0.14
    POSITIVE LOGITS
    FetchRequest
    0.15
    cher
    0.15
    ikip
    0.14
    342
    0.14
    437
    0.14
    /Object
    0.14
    ÅŁi
    0.13
    lient
    0.13
    853
    0.13
    OOD
    0.13
    Act Density 0.239%

    No Known Activations