INDEX
    Explanations

    comparative phrases or constructs that suggest increasing levels or risks related to various factors

    New Auto-Interp
    Negative Logits
     فريبيس
    -1.02
     للمعارف
    -0.97
     queſta
    -0.93
    <unused23>
    -0.92
    <unused79>
    -0.92
    <unused52>
    -0.92
    <unused68>
    -0.92
    <unused42>
    -0.92
    <unused3>
    -0.92
    <unused28>
    -0.92
    POSITIVE LOGITS
    !
    0.39
     chance
    0.39
     chances
    0.36
     you
    0.34
    progress
    0.34
     the
    0.34
    .
    0.33
     it
    0.33
    ↵↵
    0.32
     success
    0.32
    Act Density 0.020%

    No Known Activations