INDEX
    Explanations

    assertions of knowledge and self-awareness in arguments

    New Auto-Interp
    Negative Logits
    eed
    -0.06
     nag
    -0.06
    ach
    -0.06
     Kirby
    -0.06
    ãĤīãģĦ
    -0.06
    edi
    -0.06
    tries
    -0.06
    HashCode
    -0.05
     Mills
    -0.05
    try
    -0.05
    POSITIVE LOGITS
     neither
    0.08
    mund
    0.08
     pity
    0.07
    roti
    0.07
     never
    0.07
    gnore
    0.07
     NEVER
    0.07
     Strom
    0.07
     wäh
    0.07
    ç«¥
    0.07
    Act Density 0.053%

    No Known Activations