INDEX
    Explanations

    negations and expressions of doubt or uncertainty

    New Auto-Interp
    Negative Logits
    arp
    -0.15
    amo
    -0.15
     withdraw
    -0.14
    outu
    -0.14
    ories
    -0.14
    ues
    -0.14
    inar
    -0.14
    hatt
    -0.14
    acid
    -0.13
     ary
    -0.13
    POSITIVE LOGITS
     necessarily
    0.28
     matter
    0.22
    matter
    0.20
     mattered
    0.19
     ever
    0.17
    ecessarily
    0.17
     matters
    0.16
     Matter
    0.16
     compares
    0.16
    ylland
    0.16
    Act Density 0.125%

    No Known Activations