INDEX
    Explanations

    code initialization and structure

    New Auto-Interp
    Negative Logits
     [
    1.10
     $[\
    0.94
     [\
    0.93
     [$
    0.91
     ([
    0.90
     $[
    0.88
    [\
    0.87
     [`
    0.87
    [
    0.87
     [<
    0.84
    POSITIVE LOGITS
     {-
    0.94
    {-
    0.86
     ={
    0.82
    ={
    0.79
    ={"
    0.77
     {
    0.76
    ,{
    0.75
    {&
    0.72
     {(
    0.71
     {"
    0.70
    Act Density 0.006%

    No Known Activations