INDEX
    Explanations

    phrases indicating a final summary or conclusion

    New Auto-Interp
    Negative Logits
    unk
    -0.16
    adu
    -0.15
    .LogWarning
    -0.15
    IEL
    -0.15
    648
    -0.14
    iel
    -0.14
    avana
    -0.14
     иÑģ
    -0.14
    pesan
    -0.14
     Parr
    -0.14
    POSITIVE LOGITS
     Bottom
    0.28
    bottom
    0.27
     bottom
    0.26
     BOTTOM
    0.26
    .Bottom
    0.25
    (bottom
    0.24
    Bottom
    0.24
    -bottom
    0.22
    BOTTOM
    0.21
    /top
    0.20
    Act Density 0.017%

    No Known Activations