INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     researcher
    -0.07
    -game
    -0.07
     policy
    -0.06
     health
    -0.06
    sold
    -0.06
     hatred
    -0.06
    party
    -0.06
     blogger
    -0.06
     regenerated
    -0.06
     replication
    -0.06
    POSITIVE LOGITS
     endeavor
    0.29
     endeavors
    0.27
     endeavour
    0.22
     Ende
    0.10
     ende
    0.10
     empresa
    0.09
    avour
    0.07
     mùa
    0.07
     Meadows
    0.07
     esteem
    0.07
    Act Density 0.002%

    No Known Activations