INDEX
    Explanations

    words related to entities, organizations, and proper nouns

    names and terms related to politics, organizations, and social issues

    New Auto-Interp
    Negative Logits
    ocious
    -0.74
    ategory
    -0.72
    utral
    -0.69
    geries
    -0.68
    entanyl
    -0.66
    arnaev
    -0.65
    maxwell
    -0.65
    iltration
    -0.65
     Leilan
    -0.65
    arp
    -0.64
    POSITIVE LOGITS
     deems
    1.12
     deem
    0.87
     ought
    0.85
     might
    0.82
     sorely
    0.81
     couldn
    0.80
     should
    0.80
     could
    0.78
     would
    0.78
     lacked
    0.76
    Act Density 0.331%

    No Known Activations