INDEX
    Explanations

    references to Japan or Japanese culture

    mentions of the word "Japanese."

    New Auto-Interp
    Negative Logits
    fy
    -0.69
    gat
    -0.68
     Lyons
    -0.67
    uliffe
    -0.66
     Alexandria
    -0.63
    allah
    -0.63
     Bran
    -0.62
     Brees
    -0.62
    din
    -0.62
    alla
    -0.61
    POSITIVE LOGITS
     Japanese
    3.69
    Japanese
    3.35
     Japan
    2.55
    Japan
    2.55
     Taiwanese
    2.44
     Chinese
    2.21
     Korean
    2.21
     Tokyo
    2.15
     Indonesian
    2.06
     Filipino
    2.05
    Act Density 0.012%

    No Known Activations