INDEX
    Explanations

    references to authors or creators and their works in a meta-textual context

    New Auto-Interp
    Negative Logits
    endir
    -0.17
    esian
    -0.14
    orne
    -0.14
    Ca
    -0.14
    roat
    -0.13
    eward
    -0.13
    metic
    -0.13
    義
    -0.13
    ata
    -0.13
    [--
    -0.13
    POSITIVE LOGITS
     itself
    0.31
     themselves
    0.22
     herself
    0.21
     meta
    0.19
     kendisi
    0.19
    èĩªèº«
    0.18
    meta
    0.17
     selbst
    0.17
     Himself
    0.17
    (meta
    0.16
    Act Density 0.319%

    No Known Activations