INDEX
    Explanations

    characters from a non-Latin script

    New Auto-Interp
    Negative Logits
     Cly
    -0.79
    iko
    -0.69
    abase
    -0.69
    estern
    -0.69
    eeper
    -0.68
    lyak
    -0.67
     Swim
    -0.67
    oké
    -0.66
     Probe
    -0.65
    yth
    -0.65
    POSITIVE LOGITS
    ا
    1.97
    Ù
    1.95
    Ùĩ
    1.92
    ÙĨ
    1.86
    اØ
    1.86
    د
    1.86
    Ø
    1.84
    ÙĪ
    1.82
    ت
    1.81
    ر
    1.76
    Act Density 0.013%

    No Known Activations