INDEX
    Explanations

    references to Japanese and Korean entities or culture

    New Auto-Interp
    Negative Logits
     Con
    -0.34
    ton
    -0.30
     con
    -0.29
    co
    -0.29
    le
    -0.29
    ndor
    -0.29
    ic
    -0.28
     formó
    -0.28
     he
    -0.28
    ro
    -0.28
    POSITIVE LOGITS
     Japan
    2.38
     Japanese
    2.25
    Japan
    2.20
     japan
    2.16
     JAPAN
    2.14
    Japanese
    2.06
     Japón
    1.98
     japanese
    1.95
     Japon
    1.91
    japan
    1.88
    Act Density 0.130%

    No Known Activations