INDEX
    Explanations

    phrases that make general observations or assertions

    New Auto-Interp
    Negative Logits
    isman
    -0.15
    ide
    -0.15
    mitter
    -0.15
    onga
    -0.15
    apons
    -0.14
    igh
    -0.14
    atorial
    -0.14
    idan
    -0.14
    uture
    -0.14
    htar
    -0.14
    POSITIVE LOGITS
     why
    0.26
    why
    0.23
     incident
    0.21
     itself
    0.18
     INCIDENT
    0.17
    (utf
    0.17
     Incident
    0.16
    istrovstvÃŃ
    0.16
    为ä»Ģä¹Ī
    0.16
     btw
    0.16
    Act Density 0.080%

    No Known Activations