INDEX
    Explanations

    the presence of the word "I" as an indicator of personal perspective or statements

    New Auto-Interp
    Negative Logits
     intptr
    -0.89
    -0.78
    ReusableCell
    -0.76
    CEPTION
    -0.72
     يتيمه
    -0.71
    Искәрмәләр
    -0.70
    UTTON
    -0.68
     Ques
    -0.65
     homen
    -0.64
    BRARY
    -0.63
    POSITIVE LOGITS
    I
    2.71
     I
    2.08
    My
    1.26
    We
    1.25
    Tôi
    1.20
    1.10
    ฉัน
    1.07
     My
    1.05
    tôi
    1.04
     We
    1.00
    Act Density 0.078%

    No Known Activations