INDEX
    Explanations

    phrases related to contradiction or violation

    terms related to "contradiction" and "contraventions."

    New Auto-Interp
    Negative Logits
    eele
    -0.76
    istics
    -0.71
     Mehran
    -0.69
    doms
    -0.68
     GOODMAN
    -0.66
     Nanto
    -0.66
     Assass
    -0.65
     FSA
    -0.64
     LCS
    -0.63
     throats
    -0.63
    POSITIVE LOGITS
    ptions
    1.23
    ption
    1.20
     contra
    0.99
    ven
    0.95
    ventions
    0.94
    asca
    0.94
    vention
    0.88
    coni
    0.76
    ctr
    0.76
    vers
    0.73
    Act Density 0.025%

    No Known Activations