INDEX
    Explanations

    Math (excluding values)

    New Auto-Interp
    Negative Logits
     soak
    -0.08
    -0.08
    engineering
    -0.08
     engineering
    -0.07
     phys
    -0.07
     engineers
    -0.07
     Sang
    -0.07
    +w
    -0.07
     lec
    -0.07
     kines
    -0.07
    POSITIVE LOGITS
    0.08
    קרים
    0.08
     offending
    0.08
     případ
    0.08
     Thorn
    0.07
     случаев
    0.07
     rejects
    0.07
     khỏi
    0.07
     undesirable
    0.07
     запрещ
    0.07
    Act Density 0.007%

    No Known Activations