INDEX
Explanations
references to similarities and comparisons among subjects
New Auto-Interp
Negative Logits
reta
-0.18
ebin
-0.15
uko
-0.14
ettel
-0.13
éĴŁ
-0.13
alo
-0.13
ene
-0.13
ãĤ¡
-0.13
coni
-0.13
466
-0.13
POSITIVE LOGITS
same
0.76
same
0.73
identical
0.72
缸åIJĮ
0.72
Same
0.67
Same
0.66
SAME
0.64
åIJĮ
0.63
_same
0.61
åIJĮãģĺ
0.59
Activations Density 0.416%