INDEX
Explanations
comparisons or contrasts between two ideas
expressions indicating duality or contrasting qualities
New Auto-Interp
Negative Logits
ugu
-0.83
lé
-0.70
uez
-0.70
renheit
-0.70
ilit
-0.67
Kard
-0.66
dq
-0.65
acion
-0.64
lus
-0.63
lic
-0.63
POSITIVE LOGITS
sexes
1.50
sides
1.31
halves
1.26
genders
1.23
thirds
0.83
ocating
0.81
ends
0.78
senses
0.75
parties
0.74
kinds
0.73
Activations Density 0.052%