INDEX
    Explanations

    phrases mentioning specific individuals or groups

    the word "them" in various contexts

    New Auto-Interp
    Negative Logits
    Deal
    -0.75
    Press
    -0.69
    Jo
    -0.66
    order
    -0.65
    Chain
    -0.65
    ILE
    -0.65
    deal
    -0.64
     Patton
    -0.63
    Rush
    -0.63
    Monster
    -0.62
    POSITIVE LOGITS
     selves
    1.14
    atically
    1.00
    atic
    0.99
    selves
    0.87
     conduc
    0.81
     outwe
    0.75
    self
    0.70
     sinks
    0.70
     succeeded
    0.70
    atics
    0.69
    Act Density 0.038%

    No Known Activations