INDEX
    Explanations

    proper nouns, specifically names of individuals and positions related to political contexts

    New Auto-Interp
    Negative Logits
     Rebellion
    -0.74
     Rebels
    -0.73
     Wonderland
    -0.69
     Cerberus
    -0.68
     Leviathan
    -0.68
     theaters
    -0.66
     robbers
    -0.65
     Colleges
    -0.65
     womb
    -0.64
     machines
    -0.64
    POSITIVE LOGITS
    ergus
    1.01
    imir
    0.96
    frey
    0.96
    ileen
    0.95
    jit
    0.95
    ureen
    0.94
    iji
    0.93
    anya
    0.93
    resa
    0.93
    jan
    0.91
    Act Density 0.174%

    No Known Activations