INDEX
    Explanations

    terms related to hierarchy and superiority

    New Auto-Interp
    Negative Logits
    afari
    -0.19
    sworth
    -0.16
    suppress
    -0.15
    nest
    -0.15
    /how
    -0.14
    voor
    -0.14
    pray
    -0.14
    ACC
    -0.14
    doors
    -0.14
    soever
    -0.14
    POSITIVE LOGITS
    ior
    0.28
    iors
    0.27
    intendent
    0.24
    IOR
    0.24
    iore
    0.23
    charged
    0.23
    cil
    0.23
    ordinate
    0.22
    stit
    0.22
    iores
    0.21
    Act Density 0.039%

    No Known Activations