INDEX
    Explanations

    punctuation marks, primarily colons and other symbols indicating lists or emphasis

    New Auto-Interp
    Negative Logits
     ucwords
    -0.15
    ãi
    -0.14
    ymbol
    -0.14
    uche
    -0.13
     –↵↵
    -0.13
    instein
    -0.13
    ÑĥÑĢа
    -0.13
    andard
    -0.12
    umber
    -0.12
     -:
    -0.12
    POSITIVE LOGITS
     namely
    0.34
     Nam
    0.23
    nam
    0.22
     It
    0.20
     They
    0.20
     Those
    0.19
     There
    0.19
     Each
    0.18
    Nam
    0.18
     If
    0.18
    Act Density 0.088%

    No Known Activations