INDEX
    Explanations

    a mix of words that often appear in natural language text while also giving a higher activation to numbers

    New Auto-Interp
    Negative Logits
    Many
    -0.80
    many
    -0.79
     Many
    -0.77
     many
    -0.77
     MANY
    -0.77
     muchas
    -0.75
     muchos
    -0.72
    Muchos
    -0.71
     muitos
    -0.71
     molte
    -0.69
    POSITIVE LOGITS
     so
    2.77
    so
    1.93
    So
    1.55
     So
    1.47
     SO
    1.43
     så
    1.38
     così
    1.26
     ſo
    1.18
     sooo
    1.17
     так
    1.16
    Act Density 6.443%

    No Known Activations