INDEX
    Explanations

    phrases that reference groups or collectives

    New Auto-Interp
    Negative Logits
    ry
    -0.19
    hone
    -0.19
    eri
    -0.16
    /up
    -0.16
    chl
    -0.15
    baum
    -0.15
    crow
    -0.15
    иÑĤов
    -0.14
    ifer
    -0.14
    appropriate
    -0.14
    POSITIVE LOGITS
    ings
    0.40
    think
    0.24
    usc
    0.24
    INGS
    0.23
    /group
    0.22
    sWith
    0.21
    mates
    0.18
    aroo
    0.18
    ement
    0.18
     members
    0.18
    Act Density 0.057%

    No Known Activations