INDEX
    Explanations

    linearly increasing growth

    New Auto-Interp
    Negative Logits
    כל
    0.31
    מו
    0.30
    0.30
    מש
    0.29
    שי
    0.29
    ット
    0.28
     असतो
    0.28
    观念
    0.28
    உலக
    0.28
     знаешь
    0.28
    POSITIVE LOGITS
     HIS
    0.32
     με
    0.31
     Assistant
    0.29
     Walmart
    0.28
     Omicron
    0.28
     aligned
    0.28
     during
    0.27
     protesters
    0.27
     attendees
    0.27
     Jubilee
    0.27
    Act Density 0.340%

    No Known Activations