INDEX
    Explanations

    words related to different groups of people and their roles or characteristics

    references to various groups of people or professions

    New Auto-Interp
    Negative Logits
    DonaldTrump
    -0.77
    BALL
    -0.66
    forward
    -0.65
    ield
    -0.62
    ģ«
    -0.61
    ray
    -0.60
    paragraph
    -0.58
    VIEW
    -0.58
    UTERS
    -0.58
    dain
    -0.57
    POSITIVE LOGITS
    folk
    1.20
     themselves
    0.97
    '
    0.86
    hest
    0.84
    iest
    0.83
    ']
    0.80
    layer
    0.80
     involved
    0.74
    heet
    0.73
    hip
    0.70
    Act Density 0.231%

    No Known Activations