INDEX
    Explanations

    references to self-identification or self-references in the text

    references to self-identification

    New Auto-Interp
    Negative Logits
    pour
    -0.76
    onga
    -0.75
    ibaba
    -0.71
    icion
    -0.70
    rought
    -0.68
    edia
    -0.68
    iard
    -0.67
    iens
    -0.66
    oos
    -0.66
    heny
    -0.63
    POSITIVE LOGITS
    selves
    1.06
     worshipped
    0.81
     selves
    0.77
    æ³
    0.73
    self
    0.70
     è£ıè
    0.69
    ç¥ŀ
    0.68
     creatively
    0.68
     adherent
    0.67
     acknow
    0.67
    Act Density 0.035%

    No Known Activations