INDEX
    Explanations

    expressions of self-awareness and criticism

    New Auto-Interp
    Negative Logits
    ius
    -0.16
    ãĥ³ãĥĩãĤ£
    -0.15
    avra
    -0.15
     fingert
    -0.14
     Recorder
    -0.14
    дап
    -0.13
    naments
    -0.13
    pike
    -0.13
    133
    -0.13
    onta
    -0.13
    POSITIVE LOGITS
     actor
    0.19
     makers
    0.19
     actors
    0.18
     Leone
    0.18
     fans
    0.18
     Actor
    0.17
    villa
    0.17
     actress
    0.17
     Vir
    0.17
     Pri
    0.17
    Act Density 0.080%

    No Known Activations