INDEX
    Explanations

    numerical tokens, especially around punctuation

    New Auto-Interp
    Negative Logits
     longer
    0.33
     tearing
    0.30
     willingly
    0.28
     complicated
    0.27
     rear
    0.27
     newly
    0.27
     each
    0.27
     mistakes
    0.27
     older
    0.26
    }).
    0.26
    POSITIVE LOGITS
     ибо
    0.39
    如果你
    0.35
     informática
    0.34
     क्रिप्ट
    0.33
     idk
    0.33
    ordeaux
    0.32
     астро
    0.32
     psicológica
    0.32
     interessante
    0.32
     식품
    0.32
    Act Density 0.005%

    No Known Activations