INDEX
    Explanations

    references to human roles and occupations, specifically in arts and media contexts

    New Auto-Interp
    Negative Logits
    t
    -0.46
    ti
    -0.37
    ez
    -0.36
    eer
    -0.36
    tim
    -0.36
    tors
    -0.35
    tur
    -0.35
    tin
    -0.35
    ted
    -0.34
    tor
    -0.34
    POSITIVE LOGITS
    rier
    0.27
    rr
    0.26
    ship
    0.24
    ra
    0.23
    de
    0.23
    riers
    0.23
    riage
    0.22
    ium
    0.22
    iginal
    0.22
    red
    0.21
    Act Density 0.346%

    No Known Activations