INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Nag
    -0.07
     зрост
    -0.07
     subtraction
    -0.07
    Bur
    -0.06
     управ
    -0.06
     Tong
    -0.06
     işlet
    -0.06
     Bur
    -0.06
     positional
    -0.06
    -0.06
    POSITIVE LOGITS
     demo
    0.10
     Demo
    0.08
     demon
    0.08
    /demo
    0.08
    μα
    0.08
    demo
    0.07
     demos
    0.07
    inema
    0.07
     DEA
    0.07
     Dem
    0.07
    Act Density 0.008%

    No Known Activations