INDEX
    Explanations

    proper nouns related to individuals and their roles or professions

    New Auto-Interp
    Negative Logits
    aju
    -0.16
    ông
    -0.15
    indo
    -0.14
    .sb
    -0.14
    бÑĥд
    -0.14
    oft
    -0.13
    Contained
    -0.13
     SPDX
    -0.13
     loses
    -0.13
    aren
    -0.13
    POSITIVE LOGITS
     is
    0.28
     adalah
    0.25
    æĺ¯
    0.24
     æĺ¯
    0.24
    æĺ¯ä¸Ģ
    0.23
    æĺ¯æĪij
    0.23
     isa
    0.22
    	is
    0.21
    "is
    0.20
    ãģ¯
    0.20
    Act Density 0.073%

    No Known Activations