INDEX
    Explanations

    phrases or words related to the concept of human traits or characteristics

    mentions of human-related concepts or attributes

    New Auto-Interp
    Negative Logits
     slice
    -0.71
     launcher
    -0.71
     blocks
    -0.67
     Block
    -0.66
     markup
    -0.65
     Wall
    -0.63
     cartels
    -0.62
     Specialist
    -0.62
     AE
    -0.61
     aligned
    -0.61
    POSITIVE LOGITS
    hum
    4.65
    Hum
    1.89
     Hum
    1.69
     hum
    1.28
     HUM
    1.23
    hor
    1.21
    hus
    1.14
    odor
    1.11
    nat
    1.10
    hist
    1.09
    Act Density 0.009%

    No Known Activations