INDEX
    Explanations

    relational words indicating comparison or contradiction

    New Auto-Interp
    Negative Logits
    èĥ½å¤Ł
    -0.21
    æĽ´åĬł
    -0.15
    åħ·æľī
    -0.14
     manière
    -0.14
    jes
    -0.13
    hatt
    -0.13
    onth
    -0.13
    ãģ§ãģĤãģ£ãģŁ
    -0.13
    ä¹ĭåIJİ
    -0.13
    stoff
    -0.13
    POSITIVE LOGITS
     combos
    0.17
    ãģ¾ãģļ
    0.15
     EITHER
    0.14
    piler
    0.14
    gov
    0.14
    ilon
    0.13
    ã
    0.13
    ÃŃs
    0.13
    oloji
    0.13
     yes
    0.13
    Act Density 0.116%

    No Known Activations