INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ...");
    1.03
    !!!");
    1.01
    :");
    1.00
    !!");
    0.99
    ?");
    0.98
    %");
    0.97
    .):
    0.96
    $.}
    0.95
    !");
    0.95
     "]");
    0.94
    POSITIVE LOGITS
    )
    1.32
    ]
    1.20
    1.10
    }
    1.06
    ),
    0.95
    "
    0.94
    ()
    0.87
    ],
    0.83
    },
    0.78
    »
    0.74
    Act Density 0.799%

    No Known Activations