INDEX
    Explanations

    words that refer to various roles or occupations

    New Auto-Interp
    Negative Logits
    usz
    -0.19
    eur
    -0.17
    ram
    -0.17
    aland
    -0.17
    son
    -0.16
    nier
    -0.16
    iveness
    -0.16
    392
    -0.15
    -gnu
    -0.15
    sv
    -0.15
    POSITIVE LOGITS
    -upper
    0.34
    -than
    0.27
    hip
    0.23
     who
    0.23
    er
    0.22
    outes
    0.21
    idge
    0.20
    /loader
    0.20
    /renderer
    0.20
    /compiler
    0.19
    Act Density 0.265%

    No Known Activations