INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    AddTagHelper
    -1.07
     autorytatywna
    -1.06
    <unused16>
    -1.00
    <unused3>
    -1.00
    <unused51>
    -1.00
    <unused28>
    -1.00
    <unused43>
    -1.00
    <unused41>
    -1.00
    <unused8>
    -1.00
    <unused23>
    -1.00
    POSITIVE LOGITS
     in
    0.40
      
    0.39
     and
    0.39
    0.36
     about
    0.35
     is
    0.35
     par
    0.35
    0.34
     on
    0.34
     with
    0.34
    Act Density 0.002%

    No Known Activations