INDEX
    Explanations

    comparative phrases indicating greater or lesser values

    New Auto-Interp
    Negative Logits
    )
    -0.64
    .
    -0.64
    {})
    -0.63
    ])
    -0.60
    )}
    -0.59
    ]_
    -0.55
    <eos>
    -0.55
    ).
    -0.55
    )}}
    -0.55
    ),
    -0.54
    POSITIVE LOGITS
    >=
    1.05
     $>
    1.01
     $>$
    0.99
    >$
    0.98
    (>
    0.95
     >
    0.94
     >/
    0.94
     »>
    0.94
    >>>>>>>>
    0.93
    >\
    0.92
    Act Density 0.253%

    No Known Activations