INDEX
    Explanations

    self-referential statements and expressions of doubt or insecurity about one's identity

    New Auto-Interp
    Negative Logits
    idor
    -0.17
    anke
    -0.16
    loth
    -0.15
    MOTE
    -0.15
    achat
    -0.15
    anker
    -0.15
    że
    -0.14
     Leer
    -0.14
    LEASE
    -0.14
     flop
    -0.14
    POSITIVE LOGITS
     supposed
    0.19
     missing
    0.19
     alone
    0.19
    alone
    0.18
     headed
    0.17
    dense
    0.17
    Welcome
    0.17
     hereby
    0.17
     welcome
    0.16
    welcome
    0.16
    Act Density 0.100%

    No Known Activations