INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ecess
    -0.07
    uÃŃ
    -0.07
    ismatch
    -0.06
         č↵
    -0.06
    phin
    -0.06
          č↵
    -0.06
    sts
    -0.06
    lÃŃn
    -0.06
    ternet
    -0.06
    ä¼ı
    -0.06
    POSITIVE LOGITS
    Moreover
    0.15
     Moreover
    0.15
    Furthermore
    0.12
     Furthermore
    0.10
    Therefore
    0.10
     Therefore
    0.10
     moreover
    0.10
    etheless
    0.09
     However
    0.09
     Indeed
    0.09
    Act Density 0.063%

    No Known Activations