INDEX
    Explanations

    mentions of friends, family, allies, and supporters

    phrases that emphasize relationships with friends and family

    New Auto-Interp
    Negative Logits
     gears
    -0.71
    oliberal
    -0.68
    ERO
    -0.66
    ument
    -0.66
    ulo
    -0.64
    Phase
    -0.64
     pollut
    -0.63
    Ball
    -0.63
    ãĥ´ãĤ¡
    -0.63
     Pwr
    -0.62
    POSITIVE LOGITS
     neighbours
    0.87
     neighbors
    0.85
     coworkers
    0.84
     strangers
    0.84
     comrades
    0.81
     acquaintances
    0.80
     relatives
    0.78
     whom
    0.77
     classmates
    0.76
     fellow
    0.76
    Act Density 0.310%

    No Known Activations