INDEX
    Explanations

    like and as for comparisons

    New Auto-Interp
    Negative Logits
     surtout
    0.23
     Added
    0.22
    0.22
     Announces
    0.22
    0.22
     incluindo
    0.22
    ↵↵
    0.21
    0.21
    0.21
     incluyendo
    0.20
    POSITIVE LOGITS
     we
    0.36
     आपण
    0.34
     you
    0.30
     было
    0.30
     they
    0.29
     любят
    0.29
     ocurre
    0.29
     it
    0.29
     happened
    0.28
     bạn
    0.28
    Act Density 0.045%

    No Known Activations