INDEX
Explanations
conditional phrases and their implications
New Auto-Interp
Negative Logits
jug
-0.16
Frankie
-0.14
Bowl
-0.14
ardin
-0.14
orb
-0.14
ucu
-0.14
whereas
-0.14
emm
-0.14
pair
-0.13
iral
-0.13
POSITIVE LOGITS
gio
0.17
otherwise
0.16
czy
0.15
otherwise
0.15
oui
0.15
446
0.15
jinak
0.15
hay
0.14
oub
0.14
↵↵
0.14
Activations Density 0.175%