INDEX
    Explanations

    contradictions

    New Auto-Interp
    Negative Logits
    arette
    -0.08
     screening
    -0.08
    Ls
    -0.08
    paramref
    -0.08
    Tooltip
    -0.08
    ержав
    -0.07
    pas
    -0.07
    Pas
    -0.07
    Passive
    -0.07
     shoreline
    -0.07
    POSITIVE LOGITS
     contradictions
    0.20
     contradictory
    0.19
     contradiction
    0.16
     conflicting
    0.14
     contrad
    0.14
     contradict
    0.14
     विरोध
    0.13
     absurd
    0.12
     incompatible
    0.12
     paradox
    0.12
    Act Density 0.011%

    No Known Activations