INDEX
    Explanations

    phrases indicating denial or negation

    New Auto-Interp
    Negative Logits
    EG
    -0.16
    ants
    -0.14
    886
    -0.14
    entic
    -0.14
    HEL
    -0.14
    rosso
    -0.14
    ivi
    -0.13
     Chamber
    -0.13
    594
    -0.13
    alus
    -0.13
    POSITIVE LOGITS
    erson
    0.17
    sez
    0.15
    eker
    0.15
     Auditor
    0.15
    ublic
    0.15
     Mist
    0.14
    ertz
    0.14
    .elem
    0.14
    ernal
    0.14
    æ²
    0.14
    Act Density 0.006%

    No Known Activations