INDEX
    Explanations

    numeric values and mathematical symbols

    New Auto-Interp
    Negative Logits
     themſelves
    -0.94
     purpoſe
    -0.94
     itſelf
    -0.93
    ſelves
    -0.88
    ſelf
    -0.82
     himſelf
    -0.81
     uſe
    -0.80
     myſelf
    -0.79
     ſever
    -0.78
     ſtre
    -0.77
    POSITIVE LOGITS
     @"/
    0.62
    󠁿
    0.59
    >>()
    0.53
    <bos>
    0.53
     tampak
    0.43
     bağlantılar
    0.42
     Reg
    0.42
     bör
    0.41
     T
    0.41
    ValueStyle
    0.41
    Act Density 2.310%

    No Known Activations