INDEX
    Explanations

    mentions of China and related terms

    New Auto-Interp
    Negative Logits
    <bos>
    -2.16
    -0.66
    xiu
    -0.65
    guang
    -0.64
    
    
    -0.62
    qian
    -0.62
    xuan
    -0.60
    <?
    
    -0.59
    anyuan
    -0.57
    sheng
    -0.56
    POSITIVE LOGITS
     Kün
    1.05
    China
    1.02
     Bartholo
    1.01
     Khart
    0.99
     China
    0.99
     Chines
    0.99
     Schrö
    0.97
     china
    0.95
     Tarragona
    0.94
     chinese
    0.91
    Act Density 0.060%

    No Known Activations