INDEX
    Explanations

    expressions of personal identity and self-reference

    New Auto-Interp
    Negative Logits
    rvé
    -0.15
    odzi
    -0.15
    :↵↵↵↵↵↵
    -0.13
    imson
    -0.13
    ÙĦت
    -0.13
    firm
    -0.13
    окÑģи
    -0.12
    .future
    -0.12
    DBC
    -0.12
    ipmap
    -0.12
    POSITIVE LOGITS
     dun
    0.26
     Dun
    0.23
     demand
    0.20
     mean
    0.19
     fucking
    0.18
    iiii
    0.18
     swear
    0.17
    kr
    0.17
     SA
    0.17
     retract
    0.17
    Act Density 0.243%

    No Known Activations