INDEX
    Explanations

    references to social hierarchies and dynamics

    New Auto-Interp
    Negative Logits
    opensource
    -0.15
    007
    -0.14
     overall
    -0.14
     boyc
    -0.14
     mau
    -0.13
    artner
    -0.13
     flesh
    -0.13
     ran
    -0.13
     Jas
    -0.13
    彡
    -0.13
    POSITIVE LOGITS
     thing
    0.29
     idea
    0.27
     issue
    0.26
     concept
    0.25
     aspect
    0.25
     situation
    0.21
    thing
    0.21
     story
    0.21
     scenario
    0.20
     experiment
    0.20
    Act Density 0.501%

    No Known Activations