INDEX
    Explanations

    negations or expressions of denial in various contexts

    New Auto-Interp
    Negative Logits
    til
    -0.15
    WS
    -0.14
    acid
    -0.14
    Inactive
    -0.14
    erts
    -0.14
     prim
    -0.14
    ady
    -0.14
    ëĭī
    -0.14
     Til
    -0.14
    arp
    -0.13
    POSITIVE LOGITS
     necessarily
    0.23
    innamon
    0.16
     matter
    0.16
    ecessarily
    0.16
     बर
    0.15
    á»ĵn
    0.15
    olet
    0.14
     доз
    0.14
    Uvs
    0.14
    ISCO
    0.14
    Act Density 0.096%

    No Known Activations