INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Klin
    -0.77
     angeles
    -0.70
    fout
    -0.70
     Angeles
    -0.67
    <tr>
    -0.65
     lin
    -0.64
    coln
    -0.64
    Zal
    -0.63
     xr
    -0.63
    COLN
    -0.63
    POSITIVE LOGITS
    ]**
    1.31
    .**
    1.20
    (**
    1.17
     '**
    1.16
    )**
    1.16
     **
    1.14
    kwargs
    1.08
    ,**
    1.07
    :**
    0.95
     ***!
    0.92
    Act Density 0.411%

    No Known Activations