INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    gdx
    -0.75
    böz
    -0.74
     FontWeight
    -0.74
    "):
    
    -0.71
    )*/
    -0.69
     '
    
    -0.66
     */
    
    -0.66
    "],
    
    -0.66
    ede
    -0.65
    [`
    -0.65
    POSITIVE LOGITS
    ?!?
    1.59
    %!
    1.53
    !
    1.43
     !
    1.42
    ?!?!
    1.42
    !!!!!!
    1.39
    !!!!!!!
    1.37
    ?!
    1.31
    !!!!!!!!!!
    1.28
    !!
    1.24
    Act Density 0.080%

    No Known Activations