INDEX
Explanations
phrases indicating comparisons or contrasts
New Auto-Interp
Negative Logits
iem
-0.15
aldi
-0.14
alink
-0.14
ranks
-0.14
_DELETED
-0.14
.mk
-0.13
maker
-0.13
assa
-0.13
ivet
-0.13
ovky
-0.13
POSITIVE LOGITS
723
0.16
gte
0.15
785
0.15
geo
0.15
른
0.14
andest
0.14
-END
0.14
435
0.14
INAL
0.14
gn
0.14
Activations Density 0.073%