INDEX
Explanations
contrasting viewpoints or dichotomies in expressions
New Auto-Interp
Negative Logits
loi
-0.18
uzzi
-0.17
æıIJ
-0.16
ree
-0.15
518
-0.15
REE
-0.14
Garn
-0.14
andan
-0.14
707
-0.14
orida
-0.14
POSITIVE LOGITS
either
0.27
Either
0.26
Either
0.22
EITHER
0.21
binary
0.20
binary
0.20
either
0.20
Binary
0.19
-binary
0.18
anga
0.16
Activations Density 0.111%