INDEX
    Explanations

    digits after punctuation

    New Auto-Interp
    Negative Logits
    ad
    0.66
    ra
    0.57
    ↵↵
    0.50
    0.48
    5
    0.48
    am
    0.47
    an
    0.46
    ،
    0.46
     אחד
    0.45
    6
    0.44
    POSITIVE LOGITS
    ມັນ
    0.61
    عی
    0.61
    га
    0.57
    0.56
    0.55
    ется
    0.55
     
    0.55
    ές
    0.54
    ным
    0.54
    ında
    0.54
    Act Density 0.617%

    No Known Activations