INDEX
Explanations
references to separation or differences between entities or concepts
New Auto-Interp
Negative Logits
double
-0.16
loff
-0.15
cken
-0.14
ìŀIJìĿ¸
-0.14
separator
-0.14
double
-0.14
alet
-0.14
reten
-0.13
ses
-0.13
zes
-0.13
POSITIVE LOGITS
unrelated
0.21
equally
0.21
-Compatible
0.19
incompatible
0.18
/new
0.17
person
0.16
agenda
0.15
entirely
0.15
iator
0.15
completamente
0.15
Activations Density 0.170%