INDEX
Explanations
instances of the word "both."
New Auto-Interp
Negative Logits
either
-0.21
either
-0.18
als
-0.17
EITHER
-0.16
Either
-0.16
*****↵↵
-0.15
Either
-0.14
trak
-0.14
******↵
-0.14
ç´ł
-0.14
POSITIVE LOGITS
sexes
0.26
sides
0.25
/all
0.19
numerator
0.17
-sided
0.17
genders
0.17
azel
0.17
eted
0.17
iei
0.16
ends
0.15
Activations Density 0.047%