INDEX
    Explanations

    phrases expressing negation or the concept of exclusivity

    New Auto-Interp
    Negative Logits
    sts
    -0.17
    illon
    -0.16
    inel
    -0.15
    lems
    -0.15
    atory
    -0.14
    bane
    -0.14
    jom
    -0.14
    ogan
    -0.14
    hub
    -0.14
    annes
    -0.14
    POSITIVE LOGITS
     alone
    0.49
     Alone
    0.41
    alone
    0.37
    -alone
    0.33
     seule
    0.26
     sole
    0.26
     seul
    0.24
    å͝ä¸Ģ
    0.24
     solo
    0.24
     lone
    0.23
    Act Density 0.045%

    No Known Activations