INDEX
    Explanations

    phrases related to personal identity and characteristics

    the presence of the word "a" and variations related to identity and roles

    New Auto-Interp
    Negative Logits
    scenes
    -0.82
    ernels
    -0.75
    Ö¼
    -0.75
    breaks
    -0.72
    ourses
    -0.71
     appointments
    -0.69
    uden
    -0.68
    views
    -0.67
    books
    -0.67
    execute
    -0.67
    POSITIVE LOGITS
     hypocr
    1.09
     spectator
    1.05
     member
    1.04
     prostitute
    0.98
     follower
    0.98
     virgin
    0.98
     participant
    0.98
     citizen
    0.98
     believer
    0.97
     bystand
    0.97
    Act Density 0.139%

    No Known Activations