INDEX
Explanations
phrases that emphasize similarity or redundancy
New Auto-Interp
Negative Logits
endpush
-0.71
HtmlAttribute
-0.61
سكانية
-0.60
كومونز
-0.59
voici
-0.59
¯¯
-0.58
Réponses
-0.57
phenol
-0.57
amssymb
-0.56
Dez
-0.56
POSITIVE LOGITS
same
1.61
same
1.60
Same
1.56
Same
1.43
SAME
1.31
SAME
1.29
samme
1.20
samma
1.19
aynı
1.18
mesma
1.13
Activations Density 0.256%