INDEX
    Explanations

    proper names, specifically organizations and associations

    instances of the word "the" in various contexts

    New Auto-Interp
    Negative Logits
    fp
    -0.78
     understands
    -0.73
    agents
    -0.71
     besides
    -0.71
    agree
    -0.70
     according
    -0.70
     without
    -0.69
     understood
    -0.69
     beforehand
    -0.69
     because
    -0.69
    POSITIVE LOGITS
     aforementioned
    1.00
     latter
    0.91
     rest
    0.90
     entire
    0.88
     Bahamas
    0.88
     entirety
    0.86
     remainder
    0.86
     Dalai
    0.85
     largest
    0.84
     Netherlands
    0.84
    Act Density 0.223%

    No Known Activations