INDEX
    Explanations

    terms related to comparisons and contrasts

    New Auto-Interp
    Negative Logits
    ahl
    -0.17
     Verd
    -0.15
    ade
    -0.15
    codegen
    -0.14
     equally
    -0.14
     ç¶
    -0.14
    ari
    -0.13
    ily
    -0.13
     Slav
    -0.13
     Ment
    -0.13
    POSITIVE LOGITS
     unlike
    0.82
    Unlike
    0.60
     Unlike
    0.60
     like
    0.48
     compared
    0.43
     Like
    0.38
     whereas
    0.37
     rather
    0.36
     Whereas
    0.35
    rather
    0.34
    Act Density 0.001%

    No Known Activations