INDEX
Explanations
comparisons between different subjects, especially in terms of similarities and differences
New Auto-Interp
Negative Logits
izon
-0.16
izont
-0.15
gain
-0.14
chet
-0.14
inch
-0.14
609
-0.14
ngle
-0.14
vet
-0.14
ebin
-0.13
indre
-0.13
POSITIVE LOGITS
alike
0.39
缸åIJĮ
0.33
both
0.32
Both
0.30
both
0.29
Both
0.29
ä¸Ģæł·
0.28
identical
0.28
BOTH
0.27
similar
0.26
Activations Density 0.170%