INDEX
    Explanations

    references to social hierarchies and inequalities

    New Auto-Interp
    Negative Logits
    gom
    -0.15
    ulan
    -0.14
    aload
    -0.14
    ephy
    -0.14
    ernen
    -0.14
    /animations
    -0.14
    nea
    -0.14
    .swap
    -0.13
     phenomena
    -0.13
    lesh
    -0.13
    POSITIVE LOGITS
     ech
    0.45
     run
    0.42
     levels
    0.37
    rung
    0.34
     tier
    0.34
     tiers
    0.34
     ranks
    0.33
     rank
    0.31
     level
    0.30
    levels
    0.29
    Act Density 0.138%

    No Known Activations