INDEX
    Explanations

    words related to comments and commenting behavior

    New Auto-Interp
    Negative Logits
    ucha
    -0.15
    pel
    -0.15
    inel
    -0.15
    ouz
    -0.14
    chet
    -0.14
    andest
    -0.14
    abit
    -0.14
    yo
    -0.14
     Silent
    -0.14
    coma
    -0.14
    POSITIVE LOGITS
    aries
    0.19
    ìĤ¬íķŃ
    0.16
    /Instruction
    0.16
    eting
    0.16
    ICTURE
    0.16
    lint
    0.15
    ariat
    0.15
     ìĤ¬íķŃ
    0.15
    ary
    0.15
    ers
    0.14
    Act Density 0.033%

    No Known Activations