INDEX
    Explanations

    negations and phrases indicating exceptions

    New Auto-Interp
    Negative Logits
    essen
    -0.16
    .persistent
    -0.15
    leton
    -0.15
    atural
    -0.15
    dex
    -0.15
    adu
    -0.15
    ayd
    -0.14
    ernel
    -0.14
     Vak
    -0.14
    adic
    -0.14
    POSITIVE LOGITS
     necessarily
    0.28
    withstanding
    0.21
    ori
    0.21
     vice
    0.19
     merely
    0.18
    ché
    0.18
     just
    0.17
    just
    0.17
    ecessarily
    0.16
    ivor
    0.16
    Act Density 0.040%

    No Known Activations