INDEX
    Explanations

    negations and assertions that challenge common beliefs or misconceptions

    New Auto-Interp
    Negative Logits
    .Apis
    -0.18
    iger
    -0.16
    berger
    -0.16
    zan
    -0.16
    wyn
    -0.15
     Faker
    -0.15
     Pais
    -0.15
    705
    -0.15
    heels
    -0.14
    shr
    -0.14
    POSITIVE LOGITS
     anymore
    0.21
     nor
    0.16
     today
    0.16
    ItemSelected
    0.15
    uti
    0.14
     ob
    0.14
    933
    0.14
     inter
    0.14
     as
    0.14
     ab
    0.14
    Act Density 0.066%

    No Known Activations