INDEX
    Explanations

    phrases related to physical interactions, control, or authority

    phrases related to power dynamics and hierarchical positions

    New Auto-Interp
    Negative Logits
    ratulations
    -0.75
    ortium
    -0.66
    phis
    -0.66
    avorite
    -0.65
    sylv
    -0.65
    eur
    -0.63
    theless
    -0.59
     spores
    -0.59
     valuable
    -0.58
    izoph
    -0.57
    POSITIVE LOGITS
     behest
    1.06
     helm
    0.86
     altar
    0.83
     expense
    0.80
     outset
    0.80
     urging
    0.79
     discretion
    0.79
     table
    0.78
     periphery
    0.77
     mercy
    0.73
    Act Density 0.123%

    No Known Activations