INDEX
    Explanations

    phrases that emphasize similarity or redundancy

    New Auto-Interp
    Negative Logits
    endpush
    -0.71
    HtmlAttribute
    -0.61
     سكانية
    -0.60
     كومونز
    -0.59
     voici
    -0.59
    ¯¯
    -0.58
     Réponses
    -0.57
     phenol
    -0.57
    amssymb
    -0.56
     Dez
    -0.56
    POSITIVE LOGITS
    same
    1.61
     same
    1.60
    Same
    1.56
     Same
    1.43
    SAME
    1.31
     SAME
    1.29
     samme
    1.20
     samma
    1.19
     aynı
    1.18
     mesma
    1.13
    Act Density 0.256%

    No Known Activations