INDEX
    Explanations

    instances of rebuttal or counterarguments

    New Auto-Interp
    Negative Logits
    â̦↵
    -0.18
    â̦but
    -0.17
    â̦and
    -0.15
    â̦it
    -0.15
    â̦I
    -0.14
     (
    -0.14
    []
    -0.14
    â̦
    -0.14
     â̦↵
    -0.13
    ̣
    -0.13
    POSITIVE LOGITS
     EXEMPLARY
    0.17
     CHARSET
    0.16
    .scalablytyped
    0.15
     OVERRIDE
    0.15
    ãĢĤæľ¬
    0.14
     backpage
    0.14
     INTERRUPTION
    0.13
    contri
    0.13
    ADDE
    0.13
    लà¤Ĺ
    0.13
    Act Density 2.127%

    No Known Activations