INDEX
    Explanations

    assignments and initializations in code

    New Auto-Interp
    Negative Logits
    leſs
    -1.04
    neſs
    -0.90
     doubtnut
    -0.89
    ſhip
    -0.77
     Beſ
    -0.75
     itſelf
    -0.74
    ſelf
    -0.73
    [])
    
    -0.72
    -0.70
     يتيمه
    -0.69
    POSITIVE LOGITS
     =
    2.87
    =
    1.93
    >=</
    1.80
     :=
    1.60
     $=$
    1.57
     $=
    1.55
    )=
    1.51
    }=
    1.50
    ]=
    1.43
     =
    
    1.37
    Act Density 0.350%

    No Known Activations