INDEX
    Explanations

    proper nouns, particularly names

    New Auto-Interp
    Negative Logits
    COLLE
    -0.63
     polska
    -0.60
    antd
    -0.58
     pula
    -0.57
    ]")]
    -0.56
    ☆☆
    -0.54
    \}\\
    -0.53
    ||||
    -0.52
    ^^^^^^^^
    -0.52
    aires
    -0.51
    POSITIVE LOGITS
     NDEBUG
    0.71
    WriteTagHelper
    0.65
     Biôgrafia
    0.58
    PhysRevD
    0.55
     المعيارى
    0.55
    opardy
    0.55
     صوتيه
    0.54
     мәкалә
    0.54
    ilaian
    0.52
    falgar
    0.52
    Act Density 0.168%

    No Known Activations