INDEX
    Explanations

    phrases that emphasize collective or shared experiences

    New Auto-Interp
    Negative Logits
    ayet
    -0.16
    onu
    -0.16
    anou
    -0.16
    egree
    -0.16
    infinity
    -0.15
    chwitz
    -0.15
    yang
    -0.14
    äºİæĺ¯
    -0.14
    yat
    -0.14
    izmet
    -0.14
    POSITIVE LOGITS
    æ¯ķ
    0.20
     proÄį
    0.17
     weren
    0.17
     aren
    0.17
     why
    0.17
     wasn
    0.16
     who
    0.16
     isn
    0.15
     Who
    0.15
     pÅĻece
    0.15
    Act Density 0.021%

    No Known Activations