INDEX
Explanations
references to relationships and their complexities
New Auto-Interp
Negative Logits
BOTH
-0.25
both
-0.24
both
-0.24
Both
-0.21
Both
-0.21
beide
-0.16
_BOTH
-0.16
ambos
-0.16
både
-0.15
ALWAYS
-0.14
POSITIVE LOGITS
even
0.17
even
0.15
thing
0.15
something
0.15
soon
0.14
incluso
0.14
689
0.14
sogar
0.14
ancel
0.14
something
0.14
Activations Density 0.078%