INDEX
Explanations
phrases related to comparison or contrast between two entities
references to contrasting pairs or comparisons between two entities
New Auto-Interp
Negative Logits
renheit
-0.84
terday
-0.79
levard
-0.79
utor
-0.78
iggurat
-0.74
itant
-0.73
xus
-0.72
ursday
-0.72
itness
-0.71
xp
-0.70
POSITIVE LOGITS
halves
1.36
extremes
1.32
sexes
1.27
worlds
1.27
universes
1.20
sides
1.10
genres
1.10
genders
1.09
parties
1.08
styles
1.07
Activations Density 0.114%