INDEX
    Explanations

    concepts of assertion and conclusion in arguments

    New Auto-Interp
    Negative Logits
    olls
    -0.17
    uttle
    -0.15
    ulfilled
    -0.15
    azzo
    -0.15
    pla
    -0.15
    íĤ¹
    -0.14
    usat
    -0.14
    /posts
    -0.14
    ignant
    -0.14
    ĥn
    -0.13
    POSITIVE LOGITS
     Gree
    0.15
    äh
    0.15
     fam
    0.15
     chin
    0.14
     Poz
    0.14
    è¾°
    0.14
    889
    0.14
     uncomment
    0.14
    830
    0.14
    904
    0.14
    Act Density 0.003%

    No Known Activations