INDEX
Explanations
comparisons or distinctions between different entities or concepts
references to differences and distinctions between concepts or entities
New Auto-Interp
Negative Logits
ãĥĦ
-0.79
ATA
-0.79
vez
-0.76
mberg
-0.73
onz
-0.71
ãĤ®
-0.70
rive
-0.66
ãĤ±
-0.64
å°Ĩ
-0.63
ico
-0.62
POSITIVE LOGITS
between
1.61
between
1.34
Between
1.26
iveness
1.08
separating
0.98
iator
0.97
ials
0.96
maker
0.94
iating
0.90
Between
0.85
Activations Density 0.058%