INDEX
    Explanations

    references to specific geographical locations or countries

    New Auto-Interp
    Negative Logits
    olt
    -0.16
    ul
    -0.16
     individual
    -0.14
    à¥ģà¤Ĩ
    -0.14
    fu
    -0.14
    gro
    -0.14
    agy
    -0.14
    Ñĥл
    -0.14
    raft
    -0.13
    sd
    -0.13
    POSITIVE LOGITS
     itself
    0.20
    ifen
    0.19
     herself
    0.18
     Himself
    0.17
    ilies
    0.17
    ãĥ¼ãĥĵ
    0.16
     himself
    0.15
     Wick
    0.15
    lamaz
    0.15
    ÄĮesk
    0.15
    Act Density 0.059%

    No Known Activations