INDEX
    Explanations

    avoids unnecessary duplication

    New Auto-Interp
    Negative Logits
    ...”
    1.01
    …</
    0.97
    0.88
    0.86
    …’
    0.83
    0.81
    ’)
    0.80
    ’),
    0.79
     Germania
    0.79
    ,’
    0.79
    POSITIVE LOGITS
     avoids
    1.14
     using
    1.06
     Also
    1.05
    Also
    1.05
     also
    1.04
    0.99
       
    0.93
     avoiding
    0.93
    using
    0.92
     very
    0.89
    Act Density 0.397%

    No Known Activations