INDEX
    Explanations

    mathematical expressions and equations in the text

    New Auto-Interp
    Negative Logits
    irit
    -0.14
     floating
    -0.14
    arin
    -0.14
    aeda
    -0.14
    jadi
    -0.14
    íĥĦ
    -0.13
    icator
    -0.13
    ollar
    -0.13
    graphic
    -0.13
    ilan
    -0.13
    POSITIVE LOGITS
     w
    0.25
     u
    0.24
     h
    0.23
     r
    0.22
     v
    0.21
     U
    0.21
     p
    0.21
     b
    0.21
     s
    0.20
     c
    0.20
    Act Density 1.184%

    No Known Activations