INDEX
Explanations
phrases that articulate differences and distinctions between concepts or entities
New Auto-Interp
Negative Logits
fede
-0.48
perchance
-0.47
adre
-0.46
/
-0.45
elsewhere
-0.43
Sotto
-0.42
ubi
-0.42
Westfalen
-0.42
givet
-0.41
手段
-0.41
POSITIVE LOGITS
Differences
1.50
differences
1.47
difference
1.47
Differences
1.47
Difference
1.44
DIFFERENCE
1.33
Unterschied
1.32
Difference
1.32
difference
1.32
verschil
1.30
Activations Density 0.345%