INDEX
    Explanations

    determiners related to possession or affiliation, particularly "their" and "they"

    New Auto-Interp
    Negative Logits
     itſelf
    -0.67
    gears
    -0.52
    strato
    -0.52
    extAlignment
    -0.51
    voltaic
    -0.50
    󠁢
    -0.49
     Jefus
    -0.48
    ſelf
    -0.48
    SECRET
    -0.47
     akcji
    -0.46
    POSITIVE LOGITS
     themselves
    0.83
    themselves
    0.76
     they
    0.73
     Their
    0.62
    Their
    0.61
    يكب
    0.60
     their
    0.60
    they
    0.59
     forem
    0.59
     THEY
    0.59
    Act Density 0.413%

    No Known Activations