INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .
    -0.62
     I
    -0.51
     K
    -0.50
     long
    -0.48
    -0.47
     The
    -0.47
    ,
    -0.46
     بيها
    -0.46
     F
    -0.46
     a
    -0.46
    POSITIVE LOGITS
    ?
    
    1.63
    %?
    1.56
    ?}
    1.51
    ?’
    1.51
    ?"
    1.50
    ?”
    1.50
    ?''
    1.47
    ’?
    1.46
    ?'
    1.43
    ?)
    1.42
    Act Density 0.139%

    No Known Activations