INDEX
Explanations
differences between two related entities or concepts
references to comparisons or relationships between two entities
New Auto-Interp
Negative Logits
RAL
-0.73
Absent
-0.71
vich
-0.69
gow
-0.67
xus
-0.67
justice
-0.66
aqu
-0.65
%%%%
-0.65
Draft
-0.65
daq
-0.64
POSITIVE LOGITS
sexes
1.07
halves
0.98
extremes
0.91
poles
0.76
genders
0.76
sides
0.76
coasts
0.76
continents
0.75
eras
0.74
oppos
0.73
Activations Density 0.080%