INDEX
Explanations
equivalence and logical operators
New Auto-Interp
Negative Logits
🤠
0.58
🤳
0.55
🦸
0.54
🤽
0.54
🐹
0.53
🚋
0.53
🏯
0.53
🧔
0.53
🕺
0.52
👩
0.52
POSITIVE LOGITS
≡
0.54
:=
0.52
∽
0.49
∈
0.49
∝
0.49
:=
0.48
|=
0.47
⊕
0.47
=
0.47
diamond
0.46
Activations Density 0.025%