INDEX
    Explanations

    references to grenades or explosive devices

    New Auto-Interp
    Negative Logits
     Infinity
    -0.16
    curacy
    -0.14
    redi
    -0.14
    enko
    -0.14
    untu
    -0.14
    ua
    -0.14
    chap
    -0.14
     Attribution
    -0.13
    anning
    -0.13
    ieg
    -0.13
    POSITIVE LOGITS
    125
    0.16
    iaux
    0.15
    shal
    0.15
    ninger
    0.14
    ubber
    0.14
    126
    0.14
    .grp
    0.14
    .lesson
    0.13
    izona
    0.13
    iami
    0.13
    Act Density 0.002%

    No Known Activations