INDEX
    Explanations

    phrases that compare entities, emphasizing the uniqueness or superiority of one over another

    New Auto-Interp
    Negative Logits
    ugas
    -0.15
     always
    -0.15
     exactly
    -0.14
    inci
    -0.14
    rouch
    -0.14
    ali
    -0.14
    usk
    -0.14
    ney
    -0.14
     Exactly
    -0.14
    ека
    -0.13
    POSITIVE LOGITS
     other
    0.21
     single
    0.19
    other
    0.18
    åħ¶ä»ĸ
    0.17
     SINGLE
    0.16
     others
    0.16
    single
    0.16
    others
    0.16
    ught
    0.16
     otras
    0.15
    Act Density 0.032%

    No Known Activations