INDEX
    Explanations

    references to personal identity and related concepts

    New Auto-Interp
    Negative Logits
    loe
    -0.19
    esse
    -0.15
    NC
    -0.15
    esian
    -0.15
    arget
    -0.15
    izona
    -0.14
    isman
    -0.14
     ÑĤÑĢÑĥда
    -0.14
    许
    -0.14
    iar
    -0.14
    POSITIVE LOGITS
    ENTITY
    0.19
     theft
    0.19
    ponge
    0.18
    entities
    0.17
    (identity
    0.17
    zend
    0.16
    /disable
    0.16
     Theft
    0.16
     twin
    0.15
     crisis
    0.15
    Act Density 0.019%

    No Known Activations