INDEX
    Explanations

    terms related to controversy and contentious topics

    New Auto-Interp
    Negative Logits
    ling
    -0.18
    eres
    -0.17
    NonQuery
    -0.16
    VP
    -0.16
    о
    -0.16
    ot
    -0.15
    erase
    -0.15
    eri
    -0.15
    /is
    -0.15
    er
    -0.15
    POSITIVE LOGITS
    ship
    0.23
    naire
    0.22
    SHIP
    0.19
    stration
    0.19
    naires
    0.19
    ships
    0.18
    cy
    0.18
    aux
    0.17
    IONS
    0.17
    UBLE
    0.16
    Act Density 0.238%

    No Known Activations