INDEX
    Explanations

    equivalence and logical operators

    New Auto-Interp
    Negative Logits
    🤠
    0.58
    🤳
    0.55
    🦸
    0.54
    🤽
    0.54
    🐹
    0.53
    🚋
    0.53
    🏯
    0.53
    🧔
    0.53
    🕺
    0.52
    👩
    0.52
    POSITIVE LOGITS
    0.54
     :=
    0.52
    0.49
    0.49
    0.49
    :=
    0.48
     |=
    0.47
    0.47
     =
    0.47
    diamond
    0.46
    Act Density 0.025%

    No Known Activations