INDEX
    Explanations

    proper noun followed by descriptive suffix

    New Auto-Interp
    Negative Logits
    𝐞
    4.02
    𝐢
    3.39
    𝐚
    3.26
    𝐳
    2.98
    𝐮
    2.90
    𝐨
    2.85
    𝐦
    2.64
    sided
    2.63
    𝐥
    2.63
    ment
    2.58
    POSITIVE LOGITS
    м
    3.63
    3.24
    ি
    3.08
    en
    3.02
    2.90
    ي
    2.69
    ி
    2.59
     tjen
    2.49
    2.46
    ів
    2.46
    Act Density 0.060%

    No Known Activations