INDEX
    Explanations

    structured arguments and logical reasoning in discussions

    New Auto-Interp
    Negative Logits
    quer
    -0.14
    uckle
    -0.14
    anca
    -0.13
     everywhere
    -0.13
    still
    -0.13
    ว
    -0.13
    erras
    -0.13
    ĴĪ
    -0.13
     tonight
    -0.13
    enge
    -0.12
    POSITIVE LOGITS
     having
    0.33
    having
    0.28
    Having
    0.25
     Having
    0.25
     by
    0.24
     Studies
    0.23
    Studies
    0.23
     studies
    0.22
     By
    0.21
     oleh
    0.20
    Act Density 0.435%

    No Known Activations