INDEX
    Explanations

    phrases that emphasize assurance and confirmation in various contexts

    New Auto-Interp
    Negative Logits
     inability
    -0.18
     Worse
    -0.15
    aminer
    -0.14
    USTER
    -0.14
    isor
    -0.14
    oref
    -0.14
    phin
    -0.14
    oup
    -0.14
    許
    -0.13
    许å¤ļ
    -0.13
    POSITIVE LOGITS
     everyone
    0.24
     proper
    0.24
     adequate
    0.23
     sufficient
    0.23
     every
    0.21
     appropriate
    0.20
     enough
    0.20
     nothing
    0.20
     none
    0.19
     each
    0.18
    Act Density 0.101%

    No Known Activations