INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ']))
    
    -1.09
    "];
    
    -1.07
    '];
    
    -1.05
    '])
    
    -1.03
    '],
    
    -1.02
    "])
    
    -1.02
    )]
    
    -1.01
    ];
    
    -1.00
    ']
    
    -1.00
    '])->
    -0.98
    POSITIVE LOGITS
    -
    1.14
    [
    1.11
    \
    0.93
    .
    0.90
    *
    0.90
    $
    0.86
    {
    0.84
    +
    0.80
    "
    0.78
    (
    0.78
    Act Density 2.094%

    No Known Activations