INDEX
    Explanations

    contrasting or conditional statements

    New Auto-Interp
    Negative Logits
     ppl
    0.80
     plz
    0.70
     idk
    0.66
     lots
    0.65
     govt
    0.64
     đc
    0.62
     btw
    0.62
     pls
    0.61
     probs
    0.59
     approx
    0.58
    POSITIVE LOGITS
     Unlike
    0.71
     Alongside
    0.67
     вовсе
    0.63
     तकरीबन
    0.61
     ведь
    0.60
     Именно
    0.57
     Ведь
    0.56
     Surprisingly
    0.55
     Undoubtedly
    0.55
     Perhaps
    0.55
    Act Density 0.013%

    No Known Activations