INDEX
    Explanations

    instances of contrasting or contradictory phrases

    New Auto-Interp
    Negative Logits
    ãĥĪãĥ«
    -0.15
     odd
    -0.14
     Odd
    -0.14
     Jong
    -0.14
     b
    -0.13
     Cay
    -0.13
    villa
    -0.13
     Ú¯ÛĮ
    -0.13
    AndPassword
    -0.13
    china
    -0.13
    POSITIVE LOGITS
     geen
    0.16
    iese
    0.15
    ields
    0.15
    ters
    0.15
     neither
    0.15
    letal
    0.14
    lash
    0.14
    arto
    0.14
    736
    0.14
    ipel
    0.14
    Act Density 0.316%

    No Known Activations