INDEX
Explanations
phrases relating to comparisons or contrasts between two entities
references to comparisons between two entities or groups
New Auto-Interp
Negative Logits
vich
-0.84
nell
-0.81
ucked
-0.78
terday
-0.75
renheit
-0.75
daq
-0.74
owler
-0.74
levard
-0.73
pletion
-0.72
lehem
-0.71
POSITIVE LOGITS
extremes
1.30
sexes
1.27
halves
1.21
sides
1.14
genders
1.06
parties
1.02
worlds
0.93
realms
0.93
poles
0.91
universes
0.88
Activations Density 0.101%