INDEX
    Explanations

    references to political figures and their actions

    New Auto-Interp
    Negative Logits
     reportedly
    -0.24
     seem
    -0.19
     seems
    -0.18
     seeming
    -0.18
     seemed
    -0.18
     obviously
    -0.18
     says
    -0.17
     evidently
    -0.17
     apparently
    -0.17
     Seems
    -0.17
    POSITIVE LOGITS
     might
    0.23
     deserved
    0.22
     will
    0.22
     merits
    0.22
     shouldn
    0.21
     belongs
    0.21
     indeed
    0.20
     could
    0.19
     deserves
    0.19
     belonged
    0.19
    Act Density 0.571%

    No Known Activations