INDEX
    Explanations

    references to identity or self-referential phrases

    New Auto-Interp
    Negative Logits
    erunner
    -0.63
     hunne
    -0.63
    astify
    -0.61
     дописавши
    -0.60
    iastes
    -0.59
    ENOS
    -0.57
    AppCompatTheme
    -0.57
     Racine
    -0.56
    tdc
    -0.56
     brady
    -0.55
    POSITIVE LOGITS
     itself
    1.38
    itself
    1.34
     Itself
    1.29
     Roskov
    0.98
     sendiri
    0.95
     himself
    0.91
     Himself
    0.87
    本身
    0.86
     herself
    0.86
    themselves
    0.84
    Act Density 0.134%

    No Known Activations