INDEX
Explanations
comparisons or contrasts between two entities
phrases indicating comparison and commonality
New Auto-Interp
Negative Logits
Newsletter
-0.72
ogram
-0.67
all
-0.65
estro
-0.64
livion
-0.64
sale
-0.63
Order
-0.61
only
-0.61
staking
-0.60
Collider
-0.59
POSITIVE LOGITS
equally
1.00
halves
0.92
alike
0.84
identical
0.82
together
0.77
respectively
0.77
sides
0.76
aughed
0.76
IDA
0.76
sexes
0.76
Activations Density 0.239%