INDEX
    Explanations

    punctuation and formatting elements in the text

    Follows punctuation or whitespace

    New Auto-Interp
    Negative Logits
    :✨
    -0.82
     صوتيه
    -0.77
    OGND
    -0.77
    -0.74
     Italijani
    -0.73
     queſta
    -0.73
     Meksiku
    -0.72
     Administrativna
    -0.71
    ſſung
    -0.71
    Diweddarwch
    -0.70
    POSITIVE LOGITS
    ↵↵
    0.54
    ↵↵↵
    0.44
    0.43
     Two
    0.42
     An
    0.41
    Two
    0.39
    An
    0.39
    2
    0.39
    .
    0.38
     I
    0.38
    Act Density 0.007%

    No Known Activations