INDEX
    Explanations

    references to interactions and relationships among individuals and groups

    New Auto-Interp
    Negative Logits
     himself
    -0.28
     his
    -0.19
     Himself
    -0.19
     myself
    -0.19
     itself
    -0.17
     Ø®ÙĪØ¯Ø´
    -0.17
    his
    -0.17
     sám
    -0.16
    ulk
    -0.16
    los
    -0.15
    POSITIVE LOGITS
     themselves
    0.61
    Their
    0.37
     their
    0.35
     Their
    0.35
     leurs
    0.34
    their
    0.34
     thems
    0.31
     yourselves
    0.28
     иÑħ
    0.28
     jejich
    0.27
    Act Density 0.753%

    No Known Activations