INDEX
    Explanations

    expressions of complexity and contradiction in arguments

    New Auto-Interp
    Negative Logits
    iant
    -0.14
    .gameserver
    -0.14
    rosso
    -0.14
    ÌĨ
    -0.14
    ध
    -0.14
    ë
    -0.14
    imizer
    -0.13
    rint
    -0.13
    adow
    -0.13
    Invariant
    -0.13
    POSITIVE LOGITS
    etter
    0.15
    .uf
    0.14
    -wise
    0.14
     Sutton
    0.14
    fair
    0.13
     whereas
    0.13
     Fairfield
    0.13
    awn
    0.13
    dar
    0.12
    elligence
    0.12
    Act Density 0.385%

    No Known Activations