INDEX
    Explanations

    words associated with authority and social hierarchy

    New Auto-Interp
    Negative Logits
     Mour
    -0.15
    lil
    -0.15
    íĿ
    -0.15
    ichte
    -0.15
    strom
    -0.14
    iph
    -0.14
     Huffman
    -0.14
     pol
    -0.14
    .jquery
    -0.14
    late
    -0.13
    POSITIVE LOGITS
    еÑģа
    0.16
    ierge
    0.15
    erdale
    0.14
    brids
    0.14
    isti
    0.14
    ersh
    0.14
    isper
    0.14
    immers
    0.14
    ssi
    0.14
    UGIN
    0.14
    Act Density 0.002%

    No Known Activations