INDEX
    Explanations

    the presence of certain names or identifiers, particularly those related to characters or individuals

    New Auto-Interp
    Negative Logits
    es
    -0.23
    er
    -0.21
    esin
    -0.20
    ekk
    -0.19
    zman
    -0.19
    eson
    -0.17
    erse
    -0.16
    ed
    -0.16
    esini
    -0.16
    eyer
    -0.16
    POSITIVE LOGITS
    zi
    0.30
    y
    0.28
    quierda
    0.23
    ionario
    0.21
    ze
    0.20
    zen
    0.19
    riz
    0.18
    abella
    0.18
    zych
    0.17
    ibilit
    0.17
    Act Density 0.019%

    No Known Activations