INDEX
    Explanations

    words written in a specific script

    occurrences of a specific character in text, likely related to the Cyrillic alphabet

    New Auto-Interp
    Negative Logits
    riott
    -0.84
    icio
    -0.74
    aido
    -0.72
    arching
    -0.69
    anced
    -0.69
    arios
    -0.69
     Pyth
    -0.69
    urst
    -0.66
    rament
    -0.66
    20439
    -0.65
    POSITIVE LOGITS
    м
    1.16
    Ð
    1.15
    н
    1.11
    к
    1.11
    Ñı
    1.11
    ÑĤ
    1.10
    д
    1.10
    и
    1.06
    в
    1.06
    л
    1.04
    Act Density 0.007%

    No Known Activations