INDEX
    Explanations

    proper nouns, particularly names of people and characters

    New Auto-Interp
    Negative Logits
     himself
    -0.24
     Himself
    -0.19
     Ø®ÙĪØ¯Ø´
    -0.16
     sám
    -0.15
     itself
    -0.15
     kendisi
    -0.14
    ĵåIJį
    -0.14
    agli
    -0.14
    unga
    -0.14
    umer
    -0.13
    POSITIVE LOGITS
     themselves
    0.30
     respectively
    0.29
     alike
    0.27
     their
    0.25
    Their
    0.24
     Their
    0.24
    两人
    0.24
     ê°ģê°ģ
    0.24
     together
    0.22
     leurs
    0.21
    Act Density 0.155%

    No Known Activations