INDEX
    Explanations

    specific categories of nouns, such as medical conditions, interpersonal relationships, personal attributes, and financial terms

    terms related to personal relationships and individual circumstances

    New Auto-Interp
    Negative Logits
     ourselves
    -0.83
     Helpful
    -0.78
     yourselves
    -0.72
     Guan
    -0.70
     oneself
    -0.67
     themselves
    -0.65
     unison
    -0.65
     alike
    -0.65
     Rohing
    -0.63
     Beg
    -0.61
    POSITIVE LOGITS
     wife
    0.96
     girlfriend
    0.92
     buddies
    0.89
     counterpart
    0.89
     colleague
    0.87
     persona
    0.87
    mates
    0.86
     mates
    0.84
     counterparts
    0.84
    opic
    0.83
    Act Density 0.449%

    No Known Activations