INDEX
    Explanations

    instances of the word "ignore" and other similar terms

    New Auto-Interp
    Negative Logits
    icens
    -0.16
    otel
    -0.15
    ulis
    -0.15
    anzi
    -0.14
    elled
    -0.14
    meni
    -0.14
    ÙĬÙĦÙĬ
    -0.14
    ÑĥменÑĤ
    -0.14
    ines
    -0.14
    pects
    -0.14
    POSITIVE LOGITS
     therefore
    0.24
     Therefore
    0.19
     ÙĦذا
    0.17
    Therefore
    0.17
     thus
    0.16
    zilla
    0.16
    apiro
    0.16
    uth
    0.15
    onya
    0.15
     donc
    0.15
    Act Density 0.004%

    No Known Activations