INDEX
Explanations
phrases related to comparison or contrast
New Auto-Interp
Negative Logits
agre
-0.68
yll
-0.65
ular
-0.60
Interstitial
-0.59
ians
-0.59
unicip
-0.58
heit
-0.57
usage
-0.57
mas
-0.56
formance
-0.55
POSITIVE LOGITS
thirds
0.68
ses
0.63
oused
0.63
finalists
0.61
Cups
0.61
theirs
0.60
Colo
0.59
eely
0.59
legged
0.58
milo
0.58
Activations Density 0.056%