INDEX
    Explanations

    references to personal identity and self-description

    New Auto-Interp
    Negative Logits
    anda
    -0.17
    oler
    -0.17
    ãĥķãĤ
    -0.15
    ogne
    -0.15
    aterno
    -0.15
    awe
    -0.15
    ollen
    -0.15
    ands
    -0.14
    inis
    -0.14
    imals
    -0.14
    POSITIVE LOGITS
    797
    0.16
     part
    0.16
    .Err
    0.14
     unto
    0.14
     inn
    0.14
    cribe
    0.13
    PRETTY
    0.13
     tük
    0.13
     Winston
    0.13
     victims
    0.13
    Act Density 0.105%

    No Known Activations