INDEX
Explanations
references to duality and opposing sides
New Auto-Interp
Negative Logits
bū
-0.31
Biografi
-0.31
peu
-0.30
promotion
-0.30
little
-0.29
among
-0.28
깨
-0.28
Kenney
-0.28
Promotion
-0.28
Gaff
-0.28
POSITIVE LOGITS
side
1.20
Side
1.15
side
1.15
Side
1.10
SIDE
1.09
sides
1.04
Sides
1.01
sides
1.00
SIDE
0.99
sided
0.96
Activations Density 0.336%