INDEX
    Explanations

    words related to human ancestors and prehistoric lifestyles

    New Auto-Interp
    Negative Logits
    ++++
    -0.81
    cia
    -0.75
    ioxide
    -0.69
    rait
    -0.67
    ysis
    -0.66
     ours
    -0.66
    ulhu
    -0.66
     theirs
    -0.64
    milo
    -0.64
    VL
    -0.64
    POSITIVE LOGITS
    gat
    0.84
    Forest
    0.68
    Eight
    0.68
    learn
    0.66
    bart
    0.66
    auld
    0.64
    tale
    0.64
    Howard
    0.64
    poke
    0.63
    bender
    0.62
    Act Density 0.133%

    No Known Activations